Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reminder -- RAID 5 is not your friend



On Mon, Mar 15, 2010 at 3:41 PM, Daniel Feenberg <feenberg-fCu/yNAGv6M at public.gmane.org> wrote:
>
>
>
> On Mon, 15 Mar 2010, Kent Borg wrote:
>
>> Richard Pieri wrote:
>>> And neither is RAID 1. ?Except when you get lucky.
>>>
>>> I had a failure over the weekend. ?Two mirrored pairs, A1/A2 B1/B2 configuration. ?A2 and B1 failed simultaneously.
>>
>> Sounds like it is *disks* that are not your friend. And, that they hate
>> you enough that your use of raid isn't enough to save you.
>>
>> My conclusions:
>>
>> 1. don't run matched disks from the same manufacturer and lot
>> 2. watch disk temperature
>> 3. watch smartmon for indications of aging
>> 4. replace disks before they die
>> 5. use your replacements as an opportunity to get your pairs staggered
>> 6. have backups that at minimum are ping-ponged, current, and physically
>> offline
>> 7. goto #1...
>
> In most cases this is not a case of simultaneous failure due to common
> disk wear or defects, or power supply events, or controller problems. In
> most cases of apparent simultaneous failure Disk 2 has a bad sector that
> has never been written to. Such a sector can remain undisturbed for the
> life of the disk, or until the RAID software attempts to sync with another
> disk. When Disk 1 fails (and is noticed by the RAID software) and is
> replaced the sync starts copying Disk 2 to the new Disk 1 and runs until
> the bad sector on Disk 2 is encountered, at which point it announces the
> fact that Disk 2 has failed. But it didn't fail during the sync - it was
> probably bad from day 1, and if written to would have been remapped
> transparently to the user and the Raid software.

I don't understand what you are suggesting here.  When Disk 2 was made
a fully functioning member of the RAID subsystem, why wasn't every
block relevant to using it for recovery written at least once to
initialize it?   Isn't that what a RAID build/rebuild guarantees?  I
could see in the case of mirrored drives that the block was only
written once and never read again (any reads fulfilled by a different
drive), but I don't see how the block gets away with never being
written at all.

Bill Bogstad







BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org