Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] rsnapshot vs. rdiff-backup



Thanks for the descriptions Rich,
While I did not think that configuring rsnapshot was tedious, and it is
reasonably well documented. One issue I had, that you mention, is the
scheduling in cron. At work we had a WD MyBook which is a very, very
slow device. Our first backup took days. (The WD was connected to the
same switch as our NAS). I also had to schedule the rsnapshot backups
along with an offsite backup to our New York Office. The one thing I
really like about rsnapshot is that it is a snapshot, so if someone
trashes a file, it can normally be retrieved quickly. Also, when I
update my Linux system, I first make sure that my most current 'hourly'
is current. Also, rsnapshot can be used for Windows systems. If rsync is
run on a Unix/Linux file system, such as Cygwin, you do get the
advantage of hard links.

On 12/02/2013 10:26 PM, Richard Pieri wrote:
> I've been using rsnapshot for several years now and I'm reasonably
> familiar with it. It was recently suggested to me to use rdiff-backup
> to copy files to a FAT32 file system because it is aware of FAT32 and
> exFAT file name restrictions. Since then I've been experimenting with
> rdiff-backup. Here are some of the high and low points of the two.
>
>
> rsnapshot is, as the name suggests, a snapshot system. It uses a
> combination of GNU cp's hard link directory replication and rsync
> itself to maintain time-based snapshots. It functions similarly to
> Apple's Time Machine with one notable difference. Where Time Machine's
> snapshots run back forever until disk runs out then the oldest are
> pruned to make room, rsnapshot's snapshots are rotated at fixed
> points: hourly, daily, weekly, monthly, yearly with pruning managed by
> a retention policy. While I've repeatedly stated -- and still maintain
> -- that sync is not backup, maintaining many sync-based snapshots is
> close enough for some uses. When you have many users who want to be
> able to pluck single files from arbitrary times out of a backup system
> is when rsnapshot shines.
>
> There are two big drawbacks to rsnapshot. The first is setup. It's
> tedious. You need to configure the increments and retention in a
> configuration file. You need to match up the increments with
> associated cron jobs. And you need to make sure that the cron jobs are
> staggered so that they don't step on each other. rsnapshot is smart
> enough not to let that be destructive but it can mean missing snapshot
> runs and that's not good for a backup system.
>
> The second is that it is terrible for things like databases that grow
> forever. Each run will copy an entire database dump or log file or
> whatever which can lead to massively inflated disk usage.
>
> The third -- okay, three big drawbacks -- is that it only works on
> Unix file systems and their network equivalents. The hard link
> mechanism won't work on either NTFS or FAT* which makes it unusable
> for either Windows clients (being backed up) or storage.
>
>
> rdiff-backup, as the name suggests, is a backup mechanism that uses
> diffs. Specifically, it uses the rsync algorithm to calculate deltas
> (rdiff) and uses these deltas to build backup histories. Operation is
> more like Time Machine: each run adds new deltas to the history until
> you run out of space (at which point the whole thing falls apart) or
> you invoke a dedicated cleanup run to prune based on relative or
> absolute time or number of backup runs. As with rsnapshot, sync is not
> backup but a history of snapshots is close enough.
>
> There is practically no setup with rdiff-backup. Everything is command
> line arguments or external files (e.g., exclude lists) noted in the
> arguments. This makes a backup script literally a sequence of
> rdiff-backup commands. As I noted in the introduction, rdiff-backup is
> smart about escaping characters that are prohibited on target file
> systems. It also maintains a log of file ownerships and attributes
> including NTFS ACLs. That's a huge win for disaster recovery.
>
> Another win is that because it's based on deltas, and those deltas are
> compressed, it is vastly more efficient for continuously growing files
> like databases and logs and VM images. Since the rdiff algorithm is
> based on rsync it doesn't matter if the files are text or binary data.
> It's all just bits to rdiff.
>
> Now the bad. The big one is that it isn't so obvious how to find a
> specific file at a specific date and time. Only the most recent backup
> run is in the target directory. All of the compressed deltas are
> stored in a subdirectory under the target. Getting at those requires
> invoking the rdiff-backup command.
>
> rdiff-backup runs are slower than comparable rsnapshot runs.
> Calculating and compressing deltas is more CPU intensive than GNU cp
> and rsync runs. rdiff-backup's efficiency comes at a price.
>
>
> There they are. Two very different backup systems built on the same
> rsync algorithm.
>


-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix
PGP key id:3BC1EB90 
PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66  C0AF 7CEA 30FC 3BC1 EB90





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org