
One of the issues I’ve found with having lots of data is the fact that I’m worried that a hard drive will fail, and I’ll lose something important. Since I did have that happen earlier this year, I am now determined to not let it happen again. I’ve been focusing a lot on resiliency – making it so that there are no single points of failure. Since I have been collecting a lot of hard drives, I decided to put them to good use, and set up some data replication onto multiple drives.
You might ask, why not just set up a RAID5? Or, alternatively, why not buy a Drobo and be done with it? I hear the first one a lot, especially since I deal with RAID day in and day out for my day job. The main issue I don’t do either of these things is that I’m cheap! I want to be able to use all of the hardware I have without tossing out old stuff. On my server, I have the following drives:
- disk1: 120 GB ATA
- disk2: 200 GB ATA
- disk3: 300 GB USB
- disk4: 400 GB Firewire
- disk5: 600 GB SATA
- disk6: 1000 GB USB
- disk7: 1500 GB USB
Try to set up a RAID with all of that! The problem with this is the disparate drive sizes (RAID requires disks to be the same size, and will use the lowest common denominator if not.) I’ve been ‘collecting’ hard drives for quite a while, and I like the fact that I can just go to the local electronics store and pick up another drive when I run out of space. This configuration does have a downside though, and that is, how do you decide where to put everything? I decided to sit down and figure out how much space I actually need for each of my data types, and came up with this list (taking approximate growth rates into consideration):
- Photos: 100 GB
- Music: 300 GB
- Movies: 1 TB
- TV: 400 GB
- Software: 150 GB
I then tried to map things out onto the drives I have:
- disk1: Photos
- disk2: Software
- disk3: Music
- disk4: TV
- disk6: Movies
This leaves two disks, disk5 (600 GB) and disk7 (1500 GB) left to be able to mirror to. Since I can’t use standard mirroring software, I’m not able to have a real-time mirror. This is OK for my purposes, as I don’t intend on having them be hot-swappable, etc. — I just want another copy of my data out there, without having to worry about making backups all the time.
I set up my server to only export the data disks (I don’t want to be able to write to the backups from my desktop, as the changes will be overwritten by the backups) I also set up my filesystem in a convenient manner — everything is a directory off of /data:
- /data/Photos
- /data/Video
- /data/Software
- /data/Audio
This makes it very easy to get to any of my data (try that in Windows!)
One inherent advantage of having an asynchronous mirror is the fact that it can serve as a temporary backup if you accidentally delete a file/directory, whereas with a normal mirror, you’d still have to back up to tape or some other drive to be able to get this data back.
To do the actual mirroring, I opted to create a simple rsync script that will copy the data nightly from cron. Nothing overly complex, but something that makes it easy to add new filesystems via a config file:
#!/usr/bin/perl
# Mirror
# Asynchronously mirror filesystems on a local machine
# by Ed Salisbury (ed@edsalisbury.net)
# http://www.edsalisbury.net
# (c)2009 Ed Salisbury, Some Rights Reserved
#
# External Utilities Required:
# * rsync
#
# License:
# Except where otherwise noted, this work is licensed under Creative Commons
# Attribution ShareAlike 3.0.
#
# You are free:
# * to Share — to copy, distribute and transmit the work
# * to Remix — to adapt the work
#
# Under the following conditions:
# * Attribution. You must attribute the work in the manner specified by the
# author or licensor (but not in any way that suggests that they endorse
# you or your use of the work).
# * Share Alike. If you alter, transform, or build upon this work, you may
# distribute the resulting work only under the same, similar or a
# compatible license.
# * For any reuse or distribution, you must make clear to others the license
# terms of this work. The best way to do this is with a link to the
# license's web page (http://creativecommons.org/licenses/by-sa/3.0/)
# * Any of the above conditions can be waived if you get permission from the
# copyright holder.
# * Nothing in this license impairs or restricts the author's moral rights.
use warnings;
use strict;
# Configuration file for mirrors
# Example Config:
# /data/Photos/ /backup/Backup/Photos/
# /data/Software/ /backup/Backup/Software/
# /data/Audio/ /backup/Backup/Audio/
# /data/Video/ /backup/Backup/Video/
# /data/Misc/ /backup/Backup/Misc/
my $CONFIG = "/usr/local/etc/mirror.cfg";
open(CFG, $CONFIG);
my @lines = <CFG>;
close(CFG);
foreach my $line (@lines)
{
my ($src, $dest) = split(/\s+/, $line);
system("/usr/bin/rsync -av --delete $src $dest");
}
I put this script into /usr/local/bin, and then created a config file /usr/local/etc/mirror.conf similar to this:
/data/Photos/ /backup/Backup/Photos/ /data/Software/ /backup/Backup/Software/ /data/Audio/ /backup/Backup/Audio/ /data/Video/ /backup/Backup/Video/ /data/Misc/ /backup/Backup/Misc/
I then added a cronjob to have it run nightly:
0 0 * * * /bin/backup > /dev/null 2>&1
A word of caution: Make *sure* you have the directories right! rsync is set up to *DELETE* files in the destination if the file isn’t in the source directory, so if you have it rsync to two different directories, it will delete files as needed in the destination to sync them up. You have been warned! I take no responsibility for you deleting your data!
Please let me know if you found this guide/script useful.
Related posts:



Why not a “mkdir -p DIRECTORY” to solve the the DELETE problem?
Regards
Wolf
Hmm, I guess that would help with the inital sync, but not if you happened to have 2 directories named “music” (with different contents) and tried to sync them. Thanks for the suggestion!