Tips

One size never fits all, but here are some tips that might help with your backups.

Keep Your Key Safe!

This is the number one rule for any encrypted backup. Without your key.conf file, HashBackup cannot read your backup. There are no back doors, and no way to access your backup data without the key, so don’t put it off: copy your key.conf and dest.conf files to a few USB flash drives, print these files, give copies to friends and family, and store them in safe places separate from your backup data. If your computer is stolen, you don’t want the only copy of your key to be on a flash drive that was left inserted!

Straighten Up (a little)

Computers accumulate clutter, and you can save a lot of backup space by spending an hour or so removing files you don’t need. We’re talking big files, like a 650MB disk image of an operating system distribution from 3 years ago that is no longer supported. Doing this once a year can keep your computer from running out of disk space and will help prevent fragmentation on your hard drive. Fragmentation is when files get scattered all over the drive and take longer to access (on spinning hard drives), and nearly-full disks increase fragmentation.

It’s easy to go overboard and get obsessive about deleting every tiny file you don’t need. To resist that urge, use HashBackup’s count command to find the biggest files and directories on your system. As an example, it only took an hour to remove 20GB from a 4 year-old Mac system.

This is a great time to make a few new directories (folders) to store files like downloaded disk images. A little organization will make it easy to exclude large files from your backup that are easily recovered by other means and don’t need to be backed up regularly.

Local Backup

One of the central tenets of backups is that more copies are better. HashBackup can store backup data many ways, both locally and offsite. Nowadays, most computers have lots of available hard disk space, and it’s easy and cheap to acquire more, so keeping at least one copy of backup data locally is a very good strategy. A local copy of the backup makes restores much faster, adds redundancy to an offsite backup, and can even lower remote storage costs by enabling more frequent packing of offsite backup data. Local backups can be stored on:

the same system being backed up
an external USB drive
another local server via rsync, FTP, SSH, or LAN
a NAS via Samba / CIFS / SMB (or FTP, SSH, etc)

Offsite Backup

A local backup doesn’t help if your computers are stolen or destroyed by a flood or fire: you lose your original data and your local backup! For disaster recovery, it’s important to maintain up-to-date offsite backups. HashBackup supports many types of offsite storage, either to servers you maintain at other locations or to cloud storage services such as Amazon S3 and Backblaze B2.

HashBackup also supports fast "seeding" of offsite backups. A copy of the initial backup can be written to an external USB drive using a Dir destination in HashBackup’s dest.conf file. That drive can be transported to the offsite backup location, copied to a server, and then on the original server being backed up, the Dir destination is changed to an SSH or FTP destination to the offsite server (but don’t change the destination name!). The next incremental backup will only transmit changed backup data. For large disaster-situation restores, the reverse can be applied to avoid a potentially long and expensive download over a network link.

Virtual Machine Backups

Don’t forget to include your virtual machines in your backup routine! Well, unless you actually enjoy all that work getting your VM’s setup just how you want them. HashBackup can run inside the virtual machine, and this is usually the fastest way to do your backups because it allows skipping unmodified files very quickly.

Or, you can backup entire VM images by running HashBackup on the VM host. It’s sometimes slower because the entire virtual machine image has to be read, even for incremental backups, but HashBackup does a great job finding and saving only the changes since your last backup. It can be faster to backup an entire VM image, because most of the I/O is sequential. With dedup enabled, HashBackup creates smaller incremental backup images than many other backup tools like rsync and Apple’s Time Machine, greatly reducing the disk space required to store VM incrementals. And, your backup images are compressed and encrypted.

Single Server Backup

Sometimes you may need to backup a server, and all you have to work with is one server. If the server has RAID drives, you have some protection against disk failure but no protection against accidents like deleting a file or directory by mistake. Versioned backups will help you recover from drive failures and mistakes, even without RAID, if the backup data is stored on a different disk. Using two physical drives, create a backup directory on each disk, backup disk A’s data to disk B’s backup directory using the -c option, and then backup disk B’s data to disk A’s backup directory. This enables you to easily recover from mistaken deletions or file overwrites, and if either disk fails completely, you still have a copy of its data on the other (working) drive.

Be sure to exclude the "other" backup directory in inex.conf

Central Onsite Backup Server

Every company’s backup needs will be different of course, but a common backup setup is a local backup server that forwards data offsite to a remote backup server for disaster recovery. This can be easily setup with HashBackup. Take a machine with enough disk space to hold all of your local backups, and install an FTP server; this will be your local backup server. Setup each individual machine with a dest.conf that has the userid and password of the FTP server, and a unique directory (Dir keyword) for each machine. Every time a machine is backed up, it will transmit its incremental to the central FTP server. HashBackup scales very well in this situation because the hard backup work of deduplication, compression, and encryption is done on the client machines. The central backup server only runs an FTP server to store and retrieve files.

If you keep a local backup on each machine, it isn’t even necessary to use a fancy RAID setup on your backup server. RAID is great if you have the ability to manage it, but with HashBackup and a local backup on the client computer, it isn’t as critical: if you lose and have to replace the local backup server, each machine still has a local backup directory with that machine’s entire backup history, and it can simply be copied back onto the new backup server.

To ensure disaster recovery, each client can be configured with an offsite backup or the backup server itself can be backed up offsite, either with HashBackup or with tapes sent offsite.

Mail Server Backup

Large mail servers can be notoriously difficult to backup. Mail is typically stored either in the mbox format, where each mail account has all of its mail stored in one file, or in the maildir format, where each mail account is a directory and each individual email is a file. There are pros and cons to both formats, and both have their own backup issues.

The main problem with the mbox format is that each time mail is received, it is appended to the end of the file. Because the data is in one file, that file is marked as "changed" whenever new mail arrives, and most backup programs will want to save the entire file during the next backup. If a client is on vacation for a week and is receiving email every day, and the mail server is backed up every day, then here’s what happens:

there will be 7 copies of Monday’s email stored in the backup
there will be 6 copies of Tuesday’s email
and so on

Obviously this is an inefficient use of backup space. Because HashBackup with dedup does incremental backups even within one file, each day’s new email is only stored once in the backup, leading to quicker backups, less backup space required, and less bandwidth to transmit backups offsite.

For the maildir format, the main issue is that mail is initially delivered to a "new" directory, and moved to a "cur" directory after the mail is read. When a traditional backup program runs, it makes copies of both directories. When the mail is read and moved to the "cur" directory, it will be backed up again, because the message has moved to a new location. With HashBackup, this double backup does not occur because it recognizes that the file has only moved; the data is not stored twice, so again you get quicker backups, and use less space and bandwidth.

A performance difficulty with maildir servers is that there are many, many small files since each email is in its own file. An incremental backup has to check whether each file has been modified since the last backup, and this can cause loads of random I/O. For truly huge servers, the -F backup option could be used to specify exactly which files are to be saved and/or deleted from the backup, avoiding the scan to find modified files. Sharding can also be used to start multiple backup processes that will divide the load between them.

Cloud Servers

Have your head in the clouds? These days, having a virtual server "in the cloud" makes a lot of sense: they’re easier to setup, cost effective, are usually setup on RAID drives, and usually come with a dedicated technical staff to help you out when things get sticky. Many virtual server companies provide some kind of disaster recovery backup service, so that if you totally wipe your drive by mistake, you can get it back. But often this is image based, meaning you have to restore the whole drive. If just one file gets deleted, restoring just that file may be hard or impossible. Or, you may need to recover a file from 2 weeks ago and your provider can only recover data from yesterday. And keep in mind, RAID is not a backup solution: it only protects from a hardware failure. If you delete a file by mistake, RAID will instantly delete the file from all drives in the array, with no way to recover it. Adding HashBackup to the disaster recovery features of your cloud server provider ensures that your data is always safe.

Cloud Storage Services

These days there are many cloud storage services competing to provide offsite storage: Amazon S3, Google Storage, and Backblaze B2 to name just a few of the more popular services. These make it simple and relatively inexpensive to store your backups offsite. At pennies per GB per month for storage, the prices are hard to beat. And if you decide to swtich providers, HashBackup makes it easy to migrate your backup data between cloud storage services.

Automating User Backups

To make sure you always have a backup, automatic backups are the way to go. You may have the best intentions of doing regular backups, but most people will forget or get busy, and before you realize, weeks will go by without making a backup. The easy solution is to automate your backup with a cron job, and it’s easy to setup.

First, use hb init to create a backup directory. Then create a user cronfile in your home directory, like this:

MAILTO=user@email.com
# MIN HOUR DAY MONTH DAYOFWEEK   COMMAND
00 * * * * hb backup -c /home/jim/hb /home/jim; hb retain -c /home/jim/hb -t12w -x7d; hb log -s -e -x1

Save it as cronfile, then use this command to activate it:

$ crontab cronfile

Now, every hour at the top of the hour, cron will run the hb command to do an incremental backup of your home directory. Add other paths or options you usually use with HashBackup. After the backup, retain runs and keeps all backups for the last 12 weeks, except deleted files are removed from the backup 7 days after being deleted from your computer.

To automate system backups, see the Automation page.

Backup Performance Tuning

People usually want backups to happen one of two ways: get it finished as soon as possible so I can continue working, or do it in the background so I don’t notice it and it doesn’t slow my computer down. HashBackup normally uses the "as soon as possible" method.

To run HashBackup slower so that you don’t notice it while working, do this for Linux:

$ nice -19 ionice -c3 hb backup /

This tells your Linux computer to only run HashBackup when it has nothing better to do. If you are compiling programs or running simulations to find the cure for baldness while your backup is running, your backup will take longer than usual - maybe a lot longer: in tests, a backup that took 10 seconds on an idle computer took 13 minutes when the nice command was used with hb while just one CPU-intensive program was also running.

On a busy server, ionice -c3 (a Linux command) may prevent your backup from finishing, because there is no time where the disks are idle. Use ionice -c2 -n7 in that case to have a lower I/O priority but not "starve" it. You can use lower numbers with nice too, for example, nice -10 would run your backup slower, but not as slow as nice -19. The nice and ionice commands can be used in your crontab file if you have setup automatic backups.

For OSX, the taskpolicy command can be used to lower the priority of HashBackup. For example:

$ taskpolicy -d standard hb backup /

In tests, a backup of /usr that took 2 minutes while running with another I/O command (tar) took 45 minutes when taskpolicy was used. If you are doing hourly backups throughout the day and don’t want them to interfere with your regular work, you can try other settings like utility and throttle, but on a busy system your backup may not run at all.

Adding -p0 to your HB backup command will cause HB to use only 1 CPU so there are other CPUs available for other work.

Backup Your Backup

You may already have a backup solution in place and just need a way to create a redundant offsite backup for disaster recovery. HashBackup is a great solution for this!

Connect an external USB drive to your existing backup server and use the -c option with hb backup to generate your backup directly on the USB drive, or even better, use -c to a built-in drive and setup a Dir destination in dest.conf for the USB drive. For offsite protection, add a storage service to the dest.conf file so HashBackup will transmit your backup as it is being created.

To send backups to a remote server, mount the target server directory on your backup server with NFS, Samba/CIFS, or sshfs, use the -c option to a local backup directory, use a Dir destination in dest.conf pointing to your remote server, and set the cache-size-limit config option.

"Clientless" Backup

For large sites, installing a backup program on every machine to be protected can be a chore. An alternative is "clientless backup". In this setup, you have a backup server with lots of disk space and HashBackup runs on this server. On each machine to be protected, use NFS or Samba to export a directory you want backed up, like /home, and mount this on the backup server. When the backup server runs HashBackup, it can see all of the data to be protected and back it up as if it were local to the backup server. A big advantage of this setup is that HashBackup is able to dedup files across all machines.

Windows Backup

HashBackup doesn’t run on Windows, but you can use the clientless backup method for Windows backups. On the Windows machines, share the directory you want backed up, and use Samba on the backup server to mount this directory. Then run HashBackup on the backup server to save the Windows client data.

Dedup Across Multiple Machines

HashBackup doesn’t dedup data across multiple machines if you run it on each machine individually. But, if you use the clientless backup method, HashBackup will effectively dedup across multiple machines. Don’t forget to use the -D backup option or dedup-mem config option to enable dedup!

Flash Drives For Personal Backup

Flash drives can be a great backup solution. They are inexpensive and getting cheaper each year, are easy to transport, have high and increasing capacities, and are energy efficient. For an easy, secure personal backup, flash drives are hard to beat.

Here’s the "Friend Backup" system: buy a couple of high capacity flash drives and ask a friend to do the same. Label the drives with your names. Keep one of your own drives in your pocket or on your key ring and give the other to your friend to keep. Do the same with your friend’s drive: he keeps one, you keep one. Setup HashBackup to run each day (or hour, or whatever) with a cron job, and use the dest.conf file to copy the backup to your flash drive you’re carrying (inserted of course!). When you see your friend, give him your latest backup, get back your older backup. He does the same with you. Next time you backup, HashBackup will "catch up" your flash drive. You will always have the latest backup on your local machine (in the backup directory), another copy on the flash drive you have, and an offsite older backup with your friend. Because all data is encrypted and the key is never copied, neither of you can read the other’s backup, and if a flash drive gets lost, just replace it and copy your backup data from your local backup directory to the new flash drive.

The advantage of using Friend Backup is:

you have an easy-to-access offsite backup in case of theft or a real disaster
your privacy is maintained with encryption technology
you don’t have to give your data to an Internet company
you don’t have to wait days for an Internet backup to upload
restores are much faster than waiting for an offsite download

As an extra measure of safety, you could add an offsite destination to a cloud storage service in dest.conf.