Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Summary of data-protection



Derek Atkins wrote:
> Sounds like you want RAID1..

RAID is already underlying all of this, and it's there to make disk hardware
swaps painless.  This thread is mainly about the steps needed to protect
ourselves from our own occasional mistakes.  I'll summarize some thoughts:

1) Set up a backup system that includes three copies of every file, made as
soon as practical after any new files are added or old ones changed.  The
first copy should be transferred on-site, without delay.  The second copy
should be made to a remote location such as a backup service, perhaps with a
polling interval of 15 minutes to 24 hours.  The third copy should be kept
totally offline at a protected location.  (Optional:  keep more copies going
back a year or 5 years so you can revert to earlier versions.)

2) When doing system upgrades/maintenance, it will occasionally be tempting to
make seemingly-harmless changes to your software, scripts, or physical disks. 
But before wiping any data to implement one of those "harmless" change, ask
yourself:  do I have *3* extra copies of *every* file I'm now putting at risk?

3) There will be inevitable periods when you've made an exception to item #1,
mainly due to time/money constraints:  didn't have time to do the offline
backup, the software failed, there's more data than there used to be, etc. 
Accept that reality and *do* something about it before any disruptive work.

4) Use at least two different varieties of software to make your backups.  (In
my case I use CrashPlan's software for the first two varieties of backup, and
rsync for the offline ones, but I think I need a replacement for CrashPlan,
and improvements to the way I use rsync because it doesn't include any good
sequestration mechanism to use in place of '--delete'.  Perhaps toss it in
favor of rsnapshot, perhaps unison, perhaps bacula.)

5) Know what's in your backups:  keep a near-line copy of a listing of the
filenames, modification timestamps, and md5/sha1/whatever checksums.

6) Test the backups at least annually, and again before any disruptive change.
 (This is where I really want to see better scripts from software authors:
something I can stick in my local nagios to test the backups every 3 minutes
instead of whenever I need to do a restore...)

7) Understand your use cases and pick filesystem types (jfs, ext3, xfs, btrfs,
ext4, zfs) according to your own performance and data-protection requirements.
 The one with the most data-recovery tools out there is ext3, I believe; its
main performance limitation lies in file-deletion (reclaiming space from a
10GB+ file takes a lot longer on ext3 than ext4), which you probably don't
care about but it's your system, think about these things and make informed
decisions.  There *will* come a time when you wished undelete were available &
easy.

8) I've always been singing the praises of RAID but if you use it, monitor it.
Make sure you have an alert system set up to tell you within minutes of any
drive going offline.

9) One thing I'm not yet doing but should is monitoring free space for sudden
/increases/ of available space.  And arguably I should set up a tripwire for
long-term storage:  if you're keeping terabytes of permanent files, you don't
want to discover a pile of them missing months or years after a loss, very
likely after you've rotated through and wiped all the older save sets.  These
would be simple-to-craft scripts but I don't see them as part of any existing
backup software suites; it's one of those someday-I'll-get-to-it weekend
projects.

10) You could punt all of this to The Cloud but use at least two different
services, probably three.  Customer service at these companies is usually poor
and you care a lot more about your data than they do.

My $.02.

-rich





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org