Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] file system checksums



On Jun 5, 2012, at 6:11 PM, Tom Metro wrote:
> 
> Huh? The checksums permit detecting that the two devices are
> inconsistent, and the duplication lets you repair the copy with the
> failing checksum.

RAID doesn't use checksums.  RAID-1 writes every block to every device in the set.  RAID-5 writes the XOR of the block across every other device in the set.  A given RAID controller may use checksum algorithms internally but there typically are no checksums on the RAID storage.

> Whether this happens during normal array I/O, or only during a resync
> operation is another matter, but the infrastructure is present to
> perform a consistency check.

Nope.  RAID offers no consistency checking.  Look up "write hole" for plenty of essays about the subject.  Some high-end RAID controllers have systems to avoid write hole data loss.  Not all do, and even the best aren't completely impervious to bit flip corruption.

RAID is about surviving disk faults.  It's not data integrity or consistency.  Never has been and never will be.  Here's a really simple demonstration

* Take your resume.
* Create a RAID-1 set just slightly larger than the file using dm-raid and loopback devices, say 150K using my resume (a 140K file) as an example.  Put an ext3 file system on the metadevice.
* Put your resume on that file system.
* Delete all other copies of your resume.  This step is very optional but consider it done for the demonstration.
* Assuming loop0 and loop1, run these two commands to simulate on-disk corruption, assuming that 2 blocks and 4 blocks are small enough to fit within your loopback devices.
  dd if=/dev/random of=/dev/loop0 bs=512 count=1 seek=2
  dd if=/dev/random of=/dev/loop1 bs=512 count=1 seek=4
* Try to edit your resume.

If you're lucky then your first read will be okay as you get block 2 on loop1 and block 4 on loop0.  Next time may not be so fortuitous.  There are four different ways that the blocks can be read:

  loop0 block 4/loop1 block 2: no damage
  loop0 block 2/loop1 block 2: one damaged block
  loop0 block 4/loop1 block 4: one damaged block
  loop0 block 2/loop1 block 4: two damaged blocks

The RAID system will not detect anything wrong, or if it does it will force a rebuild.  And then you can kiss goodbye any chance of easily recovering your data because either loop0's bad block 2 is copied over to loop1 or loop1's back block 4 is copied over to loop0.

I suggest repeating the demonstration with a mirrored ZFS pool or Btrfs to see what a checksumming file system gets you.

--Rich P.





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org