Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ZFS and block deduplication



> From: Mark Woodward [mailto:markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org]
> Sent: Monday, April 25, 2011 9:23 AM
> 
> This is one of those things that make my brain hurt. If I am
> representing more data with a fixed size number, i.e. a 4K block vs a
> 16K block, that does, in fact, increase the probability of collision 4X,

Nope.  Remember ... If you calculate 256-bit ideally distributed hashes of
any two different input streams that are both 256-bits or larger, then the
probability of collision is 2^-256 regardless of each input block size.

When you create a 256-bit hash of any input >= 256 bits, you are essentially
picking a random (but repeatable) number from 0 to 2^256-1.  So the
probability of collision is only dependent on the number of repetitions, and
not dependent on the size of the input block.






BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org