One size never fits all, but here are some helpful tips that may make sense for you or your organization.As an extra measure of safety, you could setup an offsite destination in dest.conf.
Keep Your Key Safe!
This is the number one rule for any encrypted backup. Without your key.conf file, HashBackup cannot read your backup. There are no back doors, and no way we know of to access your backup data without the key, so don't put it off: copy your key.conf and dest.conf files to a few USB flash drives, print them, give them to friends and family, and store them in safe places separate from your backup data. If your computer is stolen, you don't want the only copy of your key to be on a flash drive that was inserted in your PC!
Straighten Up (but only a little)
Everyone has some clutter on their computer, and you can save a lot of backup space by spending an hour or so looking through the files on your computer and removing things you don't need. We're talking big files, like a 650MB disk image of an operating system distribution from 3 years ago that is no longer supported, stuff like that. Doing this once a year can keep your computer from running out of disk space and will help prevent fragmentation on your hard drive (fragmentation is where files get scattered all over the drive and take longer to access). It's easy to go overboard here and get obsessive about deleting every tiny file you don't need. To resist that urge, use the ls -l -Ss command, to list your biggest files first. After you get down to the 1MB or 100KB files, it's probably not worth your time to keep going further. Just as an example, I removed 20GB from my 4 year-old Mac PowerBook in about an hour.
This is a great time to make a few new directories (folders) to store things like downloaded disk images. A little organization will allow you to exclude large files from your backup that are easily recovered.
Flash Drives For Backups
Flash drives can be a great backup solution. They are inexpensive and getting cheaper each year, are easy to transport, have high (and increasing) capacities, and are energy efficient (hey, GREEN technology!) For an easy, secure personal backup, flash drives are hard to beat.
Here's the "Friend Backup" system: buy a couple of highish capacity flash drives and ask a friend to do the same. Label the drives. Keep one of your own drives in your pocket or on your key ring and give the other to your friend to keep. Do the same with your friend's drive: he keeps one, you keep one. Setup HashBackup to run each day (or hour, or whatever) with a cron job, and use the dest.conf file to copy the backup to the flash drive you're carrying (well, inserted of course!). When you see your friend, swap drives. Now, you will have a backup on your local machine, a recent backup in your pocket, and an offsite backup in your friend's pocket. Because all data is encrypted and the key is never copied, neither of you can read the other's backup, and if a flash drive gets lost, just replace it and copy your backup data from your local backup directory to the new flash drive.
The beauty of using Friend Backup is:
Friend Backup, Take Two
A disadvantage of Friend Backup is that you have to fiddle with flash drives. Nowadays, most folks have lots of available hard disk space, and it's always easy and cheap to acquire more. The problem is, all that local disk space doesn't do much good if your computer gadgets are stolen and you lose your original data and your backup!
So, here's a more automated version of the Friend Backup, using USB hard drives: you and your friend create user accounts for each other, setup ssh (remote login) so you can access each others' machines, and add an ssh destination in HashBackup's dest.conf file. Each of you buys an external USB hard drive to store the other's data. Save your own backup on a local drive, let HashBackup copy it with an ssh dest.conf to the external drive at your friend's house, and he does the same. With this setup you will have a local backup in case you just delete a file by mistake, or if your hard drive fails, your backup is also on the external USB drive at your friend's machine. If your data was only saved on an Internet service, you'd be looking at a very long download time to do a complete restore, but instead, you can pick up the external USB drive from your friend and do a quick local restore without the Internet.
Companies can use the same idea to copy backups between pairs of machines inside the company, or even better, to a remote company server.
As an alternative to configuring ssh, which can be a bit of a chore, setup an FTP server or buy a NAS (Network Attached Storage) unit. Be sure to disable anonymous FTP access, use long/weird usernames that can't be guessed, and set very good passwords, or you'll find your hard drive or NAS being used as a dumping ground.
Friend Backup, Take Three
Okay, so you like the idea of keeping your own personal backup drive at a friend's house, but don't want to wait three weeks for your huge picture and music collection to upload over your Internet connection. No problem: seed your initial backup locally.
Run HashBackup with your USB hard drive connected locally, and setup a destination for it in dest.conf, using the Dir destination type (pointing to your USB hard drive). This will create a local backup on your hard drive and then copy it to your USB drive. Now take the USB drive to your friends' house, and set it up so you can access it over ssh, FTP, rsync, or what have you. Back on your computer, change dest.conf to reflect the new transfer method, but don't change the destination name! On your next incremental backup, HashBackup will create new incremental archives containing just the changed data and transfer them to your USB hard drive at your friends' house. Easy!
Virtual Machine Backups
Don't forget to include your virtual machines in your backup routine! Well, unless you actually enjoy all that time getting your VM's setup just how you want them. HashBackup can run inside the virtual machine, and this is the fastest way to do your backups because it allows skipping unmodified files very quickly.
Or, you can backup entire VM images by running HashBackup on the VM host. It's slower because the entire virtual machine image has to be read, even for incremental backups, but HashBackup does a great job finding the changes since your last backup. This gives you a smaller incremental backup image than many other backup tools like rsync and Apple's Time Machine, greatly reducing the disk space required to store VM incrementals. And, your backup images are compressed and encrypted.
Self Server Backups
Sometimes you may need to backup a server, and all you have to work with is one server. If the server has RAID drives, you have some protection against disk failure but no protection against accidents such as deleting a file or directory by mistake. Versioned backups will help you recover from both drive failures and mistakes, even without RAID, if the data you are backing up is stored to a backup directory on a different disk. Using two physical drives, create a backup directory on each disk, backup disk A's data to disk B's backup directory using the -c option, and backup disk B's data to disk A's backup directory. Then you can easily recover from mistaken deletions or file overwrites, and if either disk fails completely, you still have a copy of its data in the backup directory on the working drive.
Onsite Backup Server
Every company's backup needs will be different of course, but a commonly seen backup solution is a local backup server that forwards data offsite to a remote backup server for disaster recovery. This can be easily setup with HashBackup. Take a machine with enough disk space to hold all of your local backups, and install an FTP server; this will be your local backup server. Setup each individual machine with a dest.conf that has the userid and password of the FTP server, and a unique directory for each machine. Every time a machine is backed up, it will transmit its incremental to the local FTP server.
Remember that HashBackup keeps a local backup on each machine, so it isn't necessary to use a fancy RAID setup on your backup server. RAID is great if you have the ability to manage it, but with HashBackup, it isn't as critical: if you lose and have to replace the local backup server, each machine still has a local backup directory with that machine's entire backup history, and it can simply be copied back onto the new backup server.
To ensure disaster recovery, the backup server containing multiple machine backups can arrange to send them offsite or back them up to tape and send the tapes offsite.
Mail Server Backups
Large mail servers can be notoriously difficult to backup. Mail is typically stored either in the mbox format, where each mail account has all of its mail stored in one file, or in the maildir format, where each mail account is a directory, and each individual email is a file. There are pros and cons to both formats, and both have their issues when it's time to backup.
The main problem with the mbox format is that each time mail is received, it is appended to the end of the file. Because the data is in one file, that file is marked as "changed" whenever new mail arrives, and most backup programs will want to save the entire file during the next backup. If a client is on vacation for a week and is receiving email every day, and the server is backed up every day, then what happens is:
- there will be 7 copies of Monday's email stored in the backup
- there will be 6 copies of Tuesday's email
- and so on
Obviously this is an inefficient use of backup space. Because HashBackup can do incremental backups even within one file, each day's new email is only stored once in the backup, leading to quicker backups, less backup space required, and less bandwidth to transmit backups offsite.
For the maildir format, the main issue is that mail is initially delivered to a "new" directory, and moved to a "cur" directory after the mail is read. When a traditional backup program runs, it makes copies of both directories. When the mail is read and moved to the "cur" directory, it will be backed up again, because the message has moved to a new location. With HashBackup, this double backup does not occur because it recognizes that the file has only moved; the data is not stored twice, so again you get quicker backups, and use less space and bandwidth.
You're working on your doctorial dissertation, have all kinds of data related to the study you've been conducting for the last 2 years, and ... your hard drive dies. Ugh - not good! Or maybe it's just a paper that's due in 2 days and you've been working on it for a week.
Everyone has or can get an email account, and if it supports IMAP, HashBackup can copy your backup to your email account. Because your backup is encrypted, your backup is safe and secure no matter where you put it! Many email providers give away free storage with their accounts, and most, including Gmail, have options for buying more storage. For $50/year, Gmail will increase your account limit from 7GB to 25GB. For spreadsheets, word processing documents, and presentations, an offsite or university email account can work very well to ensure you have a secure backup in the event of a drive failure or theft.
Have your head in the clouds? These days, having a virtual server "in the cloud" makes a lot of sense: they're easier to setup, cost effective, are usually setup on RAID drives, and usually come with a dedicated technical staff to help you out when things get sticky. Many virtual server companies provide some kind of disaster recovery, so that even if you totally wipe your drive by mistake, you can get something back. But often this is image based, meaning you have to restore the whole drive. If just one file gets deleted, restoring just that file may be hard or impossible. Or, you may need to recover a file from 2 weeks ago and your provider can only recover data from yesterday. And keep in mind, RAID is not a backup solution: it only protects from a hardware failure. If you delete a file by mistake, RAID will instantly delete the file from all drives in the array, with no way to recover it. Adding HashBackup to the disaster recovery features of your cloud server provider ensures that your data is always safe.
Amazon's S3 (Simple Storage Service) is a simple and relatively inexpensive way to store your backups offsite. At 12.5 cents per GB per month for storage (10 cents for reduced redundancy), the price is hard to beat. Amazon provides storage in many geographical regions, and HashBackup has support for storing your backups in any region.
If you want to make sure you always have a backup, automatic backups are the way to go. You may have the best intentions of doing regular backups, but most people will forget or get busy, and before you realize, weeks will go by without making a backup. The easy solution is to automate your backup with a cron job, and it's easy to setup.
First, create a file like this:
# MIN HOUR DAY MONTH DAYOFWEEK COMMAND
0 * * * * hb backup / /home >>backup.out
30 0 * * * hb retain -t12w -x7d >>retain.out
Save it as cronfile, then use this command to activate it:
$ crontab cronfile
Now, every hour at the top of the hour, cron will start the hb command to do an incremental backup of your / (root) and /home filesystems, with output appended to the file backup.out. Add other paths or options you usually use with HashBackup. At 12:30am, the retain command will run and keep all backups for the last 12 weeks, except for deleted files: these are removed from the backup 7 days after being deleted from your computer.
If there are errors, they will be sent to your regular email account; you can change this using the crontab mailto command. Check your man pages for more detailed information about cron and crontab. (Because daylight savings time starts at 2am, it is best to avoid this hour in cron jobs. Otherwise, your job may run twice or not at all.)
Backup Performance Tuning
People usually want backups to happen one of two ways: get it finished as soon as possible so I can continue working, or do it in the background so I don't notice it and it doesn't slow my computer down. HashBackup normally uses the "as soon as possible" method.
To run HashBackup slower so that you don't notice it while working, do this:
$ nice -19 ionice -c3 hb backup /
This tells your computer system to only run HashBackup when it has nothing better to do. If you are compiling programs or running simulations to find the cure for baldness while your backup is running, your backup will take longer than usual - maybe a lot longer: in tests, a backup that took 10 seconds on an idle computer took 13 minutes when the nice command was used with hb while just one CPU-intensive program was also running. You can use lower numbers with nice too, for example, nice -10 would run your backup slower, but not as slow as nice -19. The nice and ionice commands can be used in your crontab file if you have setup automatic backups.
Backup Your Backup
You may already have a backup solution in place, and just need a way to create a redundant backup for disaster recovery. HashBackup is a great solution for this!
Connect an external USB drive to your existing backup server and use the -c option with hb backup to generate your backup directly on the USB drive. For offsite protection, use the dest.conf file and hb will transmit your backup while it is being created.
To backup directly to a remote server, mount the target server directory on your backup server with NFS, Samba/CIFS, or sshfs, and use the -c option with hb backup to write your backup directly to the target server.
For large sites, installing a backup program on every machine to be protected can be a chore. An alternative is "clientless backup". In this setup, you have a backup server with lots of disk space and HashBackup runs on this server. On each machine to be protected, use NFS to export a directory you want backed up, like /home, and mount this on the backup server. When the backup server runs HashBackup, it can see all of the data to be protected and back it up as if it were local to the backup server.
Today HashBackup doesn't run on Windows, but you can use the clientless backup method for Windows backups. On the Windows machines, share the directory you want backed up, and use Samba on the backup server to mount this directory.
Dedup Across Multiple Machines
Today HashBackup doesn't dedup data across multiple machines if you run it on each machine individually. But, if you use the clientless backup method, HashBackup will effectively dedup across multiple machines. Don't forget to use -D to enable dedup!