BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Backing up sparse files ... VM's and TrueCrypt ... etc

Subject: Backing up sparse files ... VM's and TrueCrypt ... etc
From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
Date: Sun, 21 Feb 2010 20:43:07 -0500
In-reply-to: <4B8196FD.6010604-5a1Jt6qxUNc@public.gmane.org>
References: <000001caaf83$31cb15d0$95614170$@com> <4B7B8BC6.4020709@vl.com> <000101caafd8$56834280$0389c780$@com> <4B7C4263.7050907@vl.com> <7C2EBBD9-C37C-4647-AC4B-B4EC0C0A056B@gmail.com> <4B7DB4E9.2030805@vl.com> <000101cab101$250552f0$6f0ff8d0$@com> <4B8074D6.3070407@vl.com> <000001cab2fc$eed8d190$cc8a74b0$@com> <4B8196FD.6010604@vl.com>

> Edward Ned Harvey wrote:
> > .         Never use --sparse when creating an archive that is
> > compressed.  It's pointless, and doubles the time to create archive.
> >
> > .         Yes, use --sparse during extraction, if the contents
> contain a
> > lot of serial 0's and you want the files restored to a sparse state.
> >
> > The man page saying "using '--sparse' is not needed on extraction" is
> > misleading.  It's technically true - you don't need it - but it's
> > misleading - yes you need it if you want the files to be extracted
> sparsely.
> 
> Have you confirmed that through code inspection or experimentation?

I'll test it right now...

I have a 400Mb sparse file, junk.tc, occupying 1.06 Mb on disk.

$ time tar cf - junk.tc | gzip --fast > junk.tc.tar.gz
real    0m3.688s
(junk.tc.tar.gz is 2.79 Mb)

$ time tar cf - --sparse junk.tc | gzip --fast > junk.tc.sparse.tar.gz
real    0m33.130s
(junk.tc.sparse.tar.gz is 1.04 Mb)

If I extract the non-sparse tar.gz file as non-sparse ... I get a nonsparse
result.  As expected.

If I extract the non-sparse tar.gz file as sparse ... I get a nonsparse
result.  Bah.

If I extract the sparse tar.gz, without using the --sparse switch ... I get
a sparse file.

Apparently I was wrong.  Apparently you have no choice about it.  If you
want to backup a sparse file with tar, you have to waste a bunch of time,
and you have to use the --sparse option during archive creation.


> Also consider that the code to detect strings of zeros seems to be on
> the read side (based on the man page description). On extraction, it
> wouldn't make sense to expand the unused portions to strings of zeros,
> then follow that by code that detects the zeros and seeks past them to
> write a sparse file.

My expectation was:  on extraction, detect strings of 0's, and make them
holes.


> > ...you may be overestimating the time to read or md5sum all the 0's
> > in the hole of sparse files.
> 
> Perhaps, but...
> 
> > The hypothetical sparse_cat would improve performance, but just
> > marginally.
> 
> ...it would eliminate the need for a two-pass read with tar. And if
> summing zeros is fast, why is rsync so slow in your experiments?

Well, I've demonstrated you can sum the 0's very quickly, but I don't yet
know why rsync stinks at this.


> (A literal sparse_cat (drop-in replacement for cat) wouldn't actually
> be
> that useful, as you need to communicate to the process receiving the
> stream the byte offset for each chunk of data, assuming you want to be
> able to reconstruct the sparse file later with the same holes. So
> practically speaking, this is something you'd have to integrate into
> tar, gzip, rsync, or whatever archiver you're using.
> 
> It sounds like it would be a small project to patch tar to use the
> fcntl, as it already has a data structure figured out for recording the
> holes. But you'd still need additional hacks to do incremental
> transfers. So the bigger win would be patching rsync.)

I do plan on writing an experimental python script.  Not that it'll actually
be useful, but at least prove the concept.  And then maybe the rsync guys
will care.  Don't know.

References:
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org (Richard Pieri)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)

Prev by Date: Backing up sparse files ... VM's and TrueCrypt ... etc
Next by Date: ZFS woes (was Re: Backing up sparse files ... VM's and TrueCrypt ... etc)
Previous by thread: Backing up sparse files ... VM's and TrueCrypt ... etc
Next by thread: Backing up sparse files ... VM's and TrueCrypt ... etc
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org