Home‎ > ‎

Tips

One size never fits all, but here are some tips that might help with your backups.

Keep Your Key Safe!

This is the number one rule for any encrypted backup.  Without your key.conf file, HashBackup cannot read your backup.  There are no back doors, and no way to access your backup data without the key, so don't put it off: copy your key.conf and dest.conf files to a few USB flash drives, print these files, give copies to friends and family, and store them in safe places separate from your backup data.  If your computer is stolen, you don't want the only copy of your key to be on a flash drive that was left inserted!

Straighten Up (but only a little)

Computers accumulate clutter, and you can save a lot of backup space by spending an hour or so removing files you don't need.  We're talking big files, like a 650MB disk image of an operating system distribution from 3 years ago that is no longer supported - files like that.  Doing this once a year can keep your computer from running out of disk space and will help prevent fragmentation on your hard drive.  Fragmentation is when files get scattered all over the drive and take longer to access, and nearly-full disks increase fragmentation.  It's easy to go overboard and get obsessive about deleting every tiny file you don't need.  To resist that urge, use the ls -l -Ss command, to list your biggest files first in the directories where you store most of your data.  After you get down to the 1MB or 100KB files, it's not worth your time to keep going further.  Just as an example, it only took an hour to remove 20GB from a 4 year-old Mac PowerBook.

This is a great time to make a few new directories (folders) to store files like downloaded disk images.  A little organization will allow you to exclude large files from your backup that are easily recovered by other means and don't need to be backed up regularly.

Local Backup

One of the central tenets of backups is that more copies are better.  HashBackup can store backup data many ways, both locally and offsite.  Nowadays, most computers have lots of available hard disk space, and it's always easy and cheap to acquire more, so keeping one copy of backup data locally is a very good strategy.  A local copy of the backup makes restores much faster, adds redundancy vs an offsite-only backup, and can even lower remote storage costs by enabling more frequent "packing" of backup data.  Local backups can be stored on:

- the same system being backed up
- another local server via rsync, FTP, SSH, a SAN
- a NAS via Samba / CIFS / SMB (or FTP, SSH, etc)
- an external USB drive

Offsite Backup

A local backup doesn't help if your computers are stolen or destroyed by a flood or fire: you lose your original data and your local backup!  For disaster recovery, it's important to maintain up-to-date offsite backups.  HashBackup supports many types of offsite storage, either to servers you maintain at other locations or to cloud storage services such as Amazon S3 and Google Storage.

HashBackup also supports fast "seeding" of offsite backups.  A copy of the initial backup can be written to an external USB drive using a Dir destination in
HashBackup's dest.conf file.  That drive can be transported to the offsite backup location, copied to a server, and then back on the original server being backed up, the Dir destination is changed to an SSH or FTP destination to the offsite server (don't change the destination name!).  The next incremental backup will only transmit changed backup data.  For large disaster-situation restores, the reverse can be applied to avoid a potentially long download over a slow network link.

Virtual Machine Backups

Don't forget to include your virtual machines in your backup routine!  Well, unless you actually enjoy all that time getting your VM's setup just how you want them.  HashBackup can run inside the virtual machine, and this is usually the fastest way to do your backups because it allows skipping unmodified files very quickly.

Or, you can backup entire VM images by running HashBackup on the VM host.  It's sometimes slower because the entire virtual machine image has to be read, even for incremental backups, but HashBackup does a great job finding and saving only the changes since your last backup.  It can be faster to backup an entire VM image, because all of the I/O is sequential.  HashBackup creates smaller incremental backup images than many other backup tools like rsync and Apple's Time Machine, greatly reducing the disk space required to store VM incrementals.  And, your backup images are compressed and encrypted.

Single Server Backups

Sometimes you may need to backup a server, and all you have to work with is one server.  If the server has RAID drives, you have some protection against disk failure but no protection against accidents like deleting a file or directory by mistake.  Versioned backups will help you recover from both drive failures and mistakes, even without RAID, if the data you are backing up is stored to a backup directory on a different disk.  Using two physical drives, create a backup directory on each disk, backup disk A's data to disk B's backup directory using the -c option, and then backup disk B's data to disk A's backup directory.  This enables you to easily recover from mistaken deletions or file overwrites, and if either disk fails completely, you still have a copy of its data in the backup directory on the other (working) drive.

Central Onsite Backup Server

Every company's backup needs will be different of course, but a commonly seen backup solution is a local backup server that forwards data offsite to a remote backup server for disaster recovery.  This can be easily setup with HashBackup.  Take a machine with enough disk space to hold all of your local backups, and install an FTP server; this will be your local backup server.  Setup each individual machine with a dest.conf that has the userid and password of the FTP server, and a unique directory for each machine.  Every time a machine is backed up, it will transmit its incremental to the central FTP server.  HashBackup scales very well in this situation because the hard backup work - deduplication, compression, and encryption - are done on the client machines.  The central backup server only has to accept and store files.

If you keep a local backup on each machine, it isn't even necessary to use a fancy RAID setup on your backup server.  RAID is great if you have the ability to manage it, but with HashBackup and a local backup on the client computer, it isn't as critical: if you lose and have to replace the local backup server, each machine still has a local backup directory with that machine's entire backup history, and it can simply be copied back onto the new backup server.

To ensure disaster recovery, each client can be configured for an offsite backup or the backup server containing multiple machine backups can send them offsite or back them up to tape and send the tapes offsite.

Mail Server Backups

Large mail servers can be notoriously difficult to backup.  Mail is typically stored either in the mbox format, where each mail account has all of its mail stored in one file, or in the maildir format, where each mail account is a directory and each individual email is a file.  There are pros and cons to both formats, and both have their issues when it's time to backup.

The main problem with the mbox format is that each time mail is received, it is appended to the end of the file.  Because the data is in one file, that file is marked as "changed" whenever new mail arrives, and most backup programs will want to save the entire file during the next backup.  If a client is on vacation for a week and is receiving email every day, and the server is backed up every day, then what happens is:

- there will be 7 copies of Monday's email stored in the backup
- there will be 6 copies of Tuesday's email
- and so on

Obviously this is an inefficient use of backup space.  Because HashBackup can do incremental backups even within one file, each day's new email is only stored once in the backup, leading to quicker backups, less backup space required, and less bandwidth to transmit backups offsite.

For the maildir format, the main issue is that mail is initially delivered to a "new" directory, and moved to a "cur" directory after the mail is read.  When a traditional backup program runs, it makes copies of both directories.  When the mail is read and moved to the "cur" directory, it will be backed up again, because the message has moved to a new location.  With HashBackup, this double backup does not occur because it recognizes that the file has only moved; the data is not stored twice, so again you get quicker backups, and use less space and bandwidth.

Cloud Servers

Have your head in the clouds?  These days, having a virtual server "in the cloud" makes a lot of sense: they're easier to setup, cost effective, are usually setup on RAID drives, and usually come with a dedicated technical staff to help you out when things get sticky.  Many virtual server companies provide some kind of disaster recovery, so that even if you totally wipe your drive by mistake, you can get something back.  But often this is image based, meaning you have to restore the whole drive.  If just one file gets deleted, restoring just that file may be hard or impossible.  Or, you may need to recover a file from 2 weeks ago and your provider can only recover data from yesterday.  And keep in mind, RAID is not a backup solution: it only protects from a hardware failure.  If you delete a file by mistake, RAID will instantly delete the file from all drives in the array, with no way to recover it.  Adding HashBackup to the disaster recovery features of your cloud server provider ensures that your data is always safe.

Cloud Storage Services

These days there are many cloud storage services competing to provide offsite storage: Amazon S3, Google Storage, and Backblaze B2 to name just a few of the most economical services.  These make it simple and relatively inexpensive way to store your backups offsite.  At pennies per GB per month for storage, the prices are hard to beat.  And if you decide to swtich providers, HashBackup makes it easy to migrate you backup data over a period of time.

Automating User Backups

If you want to make sure you always have a backup, automatic backups are the way to go.  You may have the best intentions of doing regular backups, but most people will forget or get busy, and before you realize, weeks will go by without making a backup.  The easy solution is to automate your backup with a cron job, and it's easy to setup.

First, create a user cronfile in your home directory, like this:

# MIN HOUR DAY MONTH DAYOFWEEK   COMMAND
 0 * * * * hb backup -c /home/jim/hb /home/jim >>/home/jim/hb/backup.out
45 * * * * hb retain -c /home/jim/hb -t12w -x7d >>/home/jim/hb/retain.out

Save it as cronfile, then use this command to activate it:

    $ crontab cronfile

Now, every hour at the top of the hour, cron will start the hb command to do an incremental backup of your / (root) and /home filesystems, with output appended to the file backup.out.  Add other paths or options you usually use with HashBackup.  At 12:30am, the retain command will run and keep all backups for the last 12 weeks, except for deleted files: these are removed from the backup 7 days after being deleted from your computer.

If there are errors, they will be sent to your regular email account; you can change this using the crontab mailto command.  Check your man pages for more detailed information about cron and crontab.  (Because daylight savings time starts at 2am, it is best to avoid this hour in cron jobs.  Otherwise, your job may run twice or not at all.)

Automating System Backups

The previous example is for automating a per-user backup.  To automate a system backup, use hb init -c /hbbackup to create a backup directory, set whatever options you need with hb config, set excludes in the inex.conf file, and use a system-wide cron job, usually in /etc/crontab:

# MIN HOUR DAY MONTH DAYOFWEEK USER   COMMAND
00 02 * * * root /usr/local/bin/hb log backup -c /hbbackup /; /usr/local/bin/hb log retain -c /hbbackup -s30d12m; /usr/local/bin/hb log selftest -c /hbbackup -v4 --inc 1d/30d
 
A system-wide cron entry has an additional field for userid, in this case root.  You will need to be root to add an entry to /etc/crontab.

With this command, at 2AM every morning, a system-wide backup (of /) will run, followed by retain keeping the last 30 days plus a backup each month, then a selftest to verify your backup.  A log option has been added to each HB command to save all output in timestamped log files in /hbbackup/logs.

The selftest command will check all of your backup data (arc files) over a period of 30 days.  The -v4 --inc 1d/30d options can be omitted to do a quicker selftest that only verifies the local database, or this can be omitted altogether.  If all backup data is stored locally (config option cache-size-limit is -1, the default), you can use -v3 instead of -v4 to check the local arc files without downloading any remote arc files.

Backup Performance Tuning

People usually want backups to happen one of two ways: get it finished as soon as possible so I can continue working, or do it in the background so I don't notice it and it doesn't slow my computer down.  HashBackup normally uses the "as soon as possible" method. 

To run HashBackup slower so that you don't notice it while working, do this:

  $ nice -19 ionice -c3 hb backup /

This tells your computer system to only run HashBackup when it has nothing better to do.  If you are compiling programs or running simulations to find the cure for baldness while your backup is running, your backup will take longer than usual - maybe a lot longer:  in tests, a backup that took 10 seconds on an idle computer took 13 minutes when the nice command was used with hb while just one CPU-intensive program was also running. 

On a busy server, ionice -c3 (a Linux command) may prevent your backup from finishing, because there is no time where the disks are idle.  You may use ionice -c2 -n7 in that case to have a lower I/O priority but not "starve" it.  You can use lower numbers with nice too, for example, nice -10 would run your backup slower, but not as slow as nice -19.  The nice and ionice commands can be used in your crontab file if you have setup automatic backups.

Backup Your Backup

You may already have a backup solution in place, and just need a way to create a redundant backup for disaster recovery.  HashBackup is a great solution for this!

Connect an external USB drive to your existing backup server and use the -c option with hb backup to generate your backup directly on the USB drive.  For offsite protection, use the dest.conf file and hb will transmit your backup while it is being created.

To backup directly to a remote server, mount the target server directory on your backup server with NFS, Samba/CIFS, or sshfs, and use the -c option with hb backup to write your backup directly to the target server.


"Clientless" Backup

For large sites, installing a backup program on every machine to be protected can be a chore.  An alternative is "clientless backup".  In this setup, you have a backup server with lots of disk space and HashBackup runs on this server.  On each machine to be protected, use NFS or Samba to export a directory you want backed up, like /home, and mount this on the backup server.  When the backup server runs HashBackup, it can see all of the data to be protected and back it up as if it were local to the backup server.  A big advantage of this setup is that HashBackup is able to dedup files across all machines.

Windows Backups

Today HashBackup doesn't run on Windows, but you can use the clientless backup method for Windows backups.  On the Windows machines, share the directory you want backed up, and use Samba on the backup server to mount this directory.  Then run HashBackup on the backup server to save the Windows client data.

Dedup Across Multiple Machines

HashBackup doesn't dedup data across multiple machines if you run it on each machine individually.  But, if you use the clientless backup method, HashBackup will effectively dedup across multiple machines.  Don't forget to use -D to enable dedup!

Flash Drives For Personal Backups

Flash drives can be a great backup solution.  They are inexpensive and getting cheaper each year, are easy to transport, have high and increasing capacities, and are energy efficient.  For an easy, secure personal backup,  flash drives are hard to beat.

Here's the "Friend Backup" system:  buy a couple of high capacity flash drives and ask a friend to do the same.  Label the drives with your names.  Keep one of your own drives in your pocket or on your key ring and give the other to your friend to keep.  Do the same with your friend's drive: he keeps one, you keep one.  Setup HashBackup to run each day (or hour, or whatever) with a cron job, and use the dest.conf file to copy the backup to your flash drive you're carrying (inserted of course!).  When you see your friend, give him your latest backup, get back your older backup.  He does the same with you.  Next time you backup, HashBackup will "catch up" your flash drive.  You will always have the latest backup on your local machine (in the backup directory), another copy on the flash drive you have, and an offsite older backup with your friend.  Because all data is encrypted and the key is never copied, neither of you can read the other's backup, and if a flash drive gets lost, just replace it and copy your backup data from your local backup directory to the new flash drive.

The advantage of using Friend Backup is:
  • you have an easy-to-access offsite backup in case of theft or a real disaster
  • your privacy is maintained with encryption technology
  • you don't have to give your data to an Internet company
  • you don't have to wait days for an Internet backup to upload
  • restores are much faster than waiting for an Internet backup service download
As an extra measure of safety, you could setup an offsite destination to a cloud storage service in dest.conf.

Student Backups

You're working on your doctoral dissertation, have all kinds of data related to the study you've been conducting for the last 2 years, and ... your hard drive dies.  Ugh - not good!  Or maybe it's just a paper that's due in 2 days and you've been working on it for a week.

Everyone has or can get an email account, and if it supports IMAP, HashBackup can copy your backup to your email account.  Because your backup is encrypted, it's safe and secure no matter where you put it!  Many email providers give away free storage with their accounts (Google gives away 15GB), and most, including Gmail, have options for buying more storage.  For $25/year, Google will increase your account storage limit to 100GB.  For spreadsheets, word processing documents, and presentations, an offsite or university email account can work very well to ensure you have a secure backup in the event of a drive failure or theft.
Comments