BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ZFS and block deduplication

Subject: ZFS and block deduplication
From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
Date: Mon, 25 Apr 2011 09:45:33 -0400
In-reply-to: <4DB575C4.6090901-FJ05HQ0HCKaWd6l5hS35sQ@public.gmane.org>
References: <4DB1A1B7.2040304@mohawksoft.com> <000101cc01b2$52b14ee0$f813eca0$@nedharvey.com> <4DB322E2.9020909@mohawksoft.com> <000001cc02f3$e206d9b0$a6148d10$@nedharvey.com> <4DB575C4.6090901@mohawksoft.com>

> From: Mark Woodward [mailto:markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org]
> Sent: Monday, April 25, 2011 9:23 AM
> 
> This is one of those things that make my brain hurt. If I am
> representing more data with a fixed size number, i.e. a 4K block vs a
> 16K block, that does, in fact, increase the probability of collision 4X,

Nope.  Remember ... If you calculate 256-bit ideally distributed hashes of
any two different input streams that are both 256-bits or larger, then the
probability of collision is 2^-256 regardless of each input block size.

When you create a 256-bit hash of any input >= 256 bits, you are essentially
picking a random (but repeatable) number from 0 to 2^256-1.  So the
probability of collision is only dependent on the number of repetitions, and
not dependent on the size of the input block.

References:
- ZFS and block deduplication
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- ZFS and block deduplication
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- ZFS and block deduplication
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- ZFS and block deduplication
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- ZFS and block deduplication
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)

Prev by Date: ZFS and block deduplication
Next by Date: VSphere client on Linux
Previous by thread: ZFS and block deduplication
Next by thread: ZFS and block deduplication
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org