Commands‎ > ‎

Backup

Performs a backup.  The advanced options are described later, but commonly-used options are:

  $ hb backup [-c backupdir] [-v level] [-n]
             [-t tagname] [-D mem]
             path1 path2 ...


Backup will not cross mount points (separate file systems) unless -X is used, so you have to explicitly list each filesystem you want to backup (use the df command to see your filesystems).  The backup is stored in the local backup directory specified (-c option) or a default backup directory if -c isn't used.

-t    tag the backup with a name that hb ls will display
-v    control how much output is displayed, higher numbers for more:

  • -v0 = print version, backup directory, copy messages, error count
  • -v1 = print names of skipped filesystems
  • -v2 = print names of files backed up (this is the default)
  • -v3 = print names of excluded files
-n    do not copy the main database after the backup, because rm, retain or another backup is running next

-D        look for duplicate data blocks and only save 1 copy.  -D is followed by the amount of memory you want to dedicate to the  dedup operation, for example:  -D1g would use 1 gigabyte of RAM, -D100m would use 100mb.  The more memory you have available, the more duplicate data HashBackup can detect.  See doc/dedup.info for more detailed information and recommendations.

Each backup creates a new snapshot or version in the backup directory, containing all files modified since the last backup.  An easy way to think of this is a series of incremental backups, stacked one on top of the other.  HashBackup presents the illusion that every version is a full backup, while performing the backups at incremental speeds and using incremental backup storage space.  While full backups are an option, there is really no need to do weekly or monthly full backups, and you will experience huge savings of time, disk space, and bandwidth compared to traditional backup methods.  The RETAIN command explains how to maintain weekly or monthly snapshots
- or both!
 
The Backup Directory

All commands accept an optional -c option to specify the backup directory.  HashBackup will store your backup data and the encryption key in this directory.  If not specified on the command line, the environment variable HASHBACKUP_DIR is checked.  If this environment variable doesn't exist, the directory ~/hashbackup is used if it exists.  If ~/hashbackup doesn't exist, /var/hashbackup is used.  Backup directories must be first be initialized with the hb init command..

HashBackup saves all file properties in a database, so backups can be stored on any type of "dumb" storage: FAT, VFAT, CIFS/Samba, NFS, USB thumb drives, SSHFS, WebDAV, FTP servers, etc., and all file attributes, including hard links, ACLs, and extended attributes will be accurately saved and restored, even if not supported by the storage system.

It's possible to backup directly to a mounted remote filesystem or USB drive with the -c option.  However, the backup performance may be slower than with a local filesystem, and they may not provide robust sync and lock facilities.  Sync facilities are used to ensure that the backup database doesn't get corrupted if your computer halts unexpectedly during the backup process, for example, if there is a power outage.  Lock facilities ensure that two backups don't run at the same time, possibly overwriting each others' backup files.  It is better to specify remote storage using the dest.conf file in your backup directory.

Important Security Note: your encryption key is stored in the backup directory.  Usually this is a local directory, but if you are writing your backup directly to remote storage, for example, Google Drive, be sure to set a passphrase when initializing your backup directory.  This is done with the -p ask option to hb init.  Both the passphrase and key are required to access your backup.  If you are writing directly to remote storage that you control, such as an NFS server, a passphrase may not be as important.

Destinations

The backup command creates backup archive files in the backup directory.  As archives fill up, they can be transmitted to one or more remote destinations.  Transmissions run in parallel with the backup process.  After the backup has completed, it waits for all files to finish transmitting.  By default, all archives are kept in the local backup directory.  This makes restores very fast since data does not have to be downloaded from a remote site.  It is also possible to delete archives from the local backup directory after they are sent to all remotes.  See the cache-size-limit config option.

Remote destinations are setup in the dest.conf file, located in the backup directory.  See the example dest.conf files in the doc directory for more detailed info on how to configure destinations for SSH, RSYNC, and FTP servers, Amazon S3, Gmail, IMAP/email servers, and simple directory destinations.  Directory destinations are used to store extra copies of your backup on USB thumb drives, NFS, WebDAV, Samba, and any other remote storage that can be mounted as a filesystem.


NOTE: if you are storing your backup on a USB or other removable drive with the -c option, and that's the only backup copy you want, you would not need to use the dest.conf file at all.  dest.conf is only used when you want to create more than 1 copy of your backup, usually at a remote site.  It's good practice to use a passphrase when writing backups directly to removable drives with -c, because the key will also be copied to the removable drive.  If the -c backup directory is local and the removable drive is in dest.conf, the key is not copied to the removable drive.

Excludes

A handful of system files and directories are automatically excluded, like swapfiles, /tmp, core files, /proc, /sys, and hibernation files. 
/var/hashbackup and the -c backup directory itself are also automatically excluded.  An inex.conf file is created by the init command in the backup directory showing which files are excluded.
 

To exclude other files, edit the inex.conf file in the backup directory.  The format of this file is:

# comment lines
E(xclude) <pathname>

Example inex.conf:

# exclude all .wav files:
Exclude *.wav

# exclude the /Cache directory and its contents:
e /Cache

# save the /Cache directory itself, but don't save the contents.
# Requesting a backup of /Cache/xyz still works.
e /Cache/

# save the /Cache directory itself, but don't save the contents.
# Requesting a backup of /Cache/xyzdir will save the directory itself
# since it was explicitly requested, but will not save xyzdir's
# contents
e /Cache/*

Any abbreviation of the exclude keyword at the beginning of the line is recognized.

There are several other ways to exclude files from the backup:
  • config variable no-backup-tag can be set to a list of filenames, separated by commas.  If a directory contains any of these files, only the directory itself and the tag files will be saved.  A typical value here is .nobackup
  • files with the nodump flag set are not backed up
  • config variable no-backup-ext can be set to a list of comma-separated extensions, with or without dots.  Any files ending in .ext (one of the extensions listed) is not backed up.  This test is done without regard to uppercase or lowercase, so it isn't possible to exclude .TXT but backup .txt.

Advanced Options

The backup command has a few more options that may be useful:

-m maxfilesize  any file larger than maxfilesize will be skipped.  The size can be specified with values like 100m or 100mb for 100 megabytes, 3g or 3gb for 3 gigabytes, etc.

-p procs        specifies how many additional processes to use for the backup.  Normally HashBackup will use 1 process on single-CPU systems, and several processes on multi-core systems with more than 1 CPU.  Multiple processes speed up the backup but also put a higher load on your system.  To reduce the performance impact of a backup on your system, you may want to use -p0 to force using only 1 CPU.  Or, with a very fast hard drive and many CPU cores, you may want to use more than just a few processes, and can experiment with -p4 or -p5.  Higher numbers will probably reduce performance, so experimenting is the best guide.

-n              if backup, rm, or retain is going to be run after this backup, the -n option can be used.  This will prevent transmitting the main HashBackup database, which is unnecessary since the next command is also going to change it.  But be careful here: if you don't run one of these hb commands after this backup, the database will not be up to date on your remote destinations and whatever you have saved in this backup will not be restorable.  Previous backups will still be there of course.

-B blocksize         the backup command splits files into blocks of data.  It chooses a "good" blocksize based on the type of file being saved and often uses variable-sized blocks.  But for some data, you may want to force a specific block size.  The block sizes allowed are: 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M.  In general, a larger block size will compress better but dedup less; small block sizes will dedup better but compress less.  Since this is very data dependent, experiment with your own data to determine the best block size to use.  Using -B1m will usually speed up backups on a single CPU system because it disables variable block sizes.

--full          the backup command usually backs up only modified files, and determines this by looking at mtime, ctime, and other file attributes for each file.  This option overrides that behavior and causes every file to be saved, even if it has not changed.  Taking a full backup adds redundancy to your backup data.  Another way to accomplish this is to create a new backup directory on both local and remote storage.  But this is harder to manage and retain will not work across multiple backup directories.  Full backups also limit the number of incremental archives required to do large restores, though in actual practice, restoring a complete OSX system from 3 years of daily incrementals took only 15% longer than restoring from a full backup.

--no-mtime      there are two timestamps on every Unix file: mtime is the last time a file's data was modified, and ctime is the last time a file's attributes were modified.  If a file's data has changed, both the data and attributes are saved.  If only the attributes are changed, only the attributes are saved.  Because mtime can be set by user programs, some administrators may not trust it to indicate that a file's data has changed.  The --no-mtime option tells the backup command to verify file contents with a strong checksum rather than trust mtime.  This is not recommended for normal use.

-X              backup usually does not descend into other filesystems, but will if -X is used.  Be careful, because backup does not distinguish between local and remote filesystems, so it's easy to backup an NFS server unintentionally with -X.

-Z level        sets the compression level.  Generally, the higher the level, the less backup space will be used, but the backup and restore may also take longer.  The default level is 3, which gives good compression and good speed.  The compression level ranges from 0 to 9, where -Z0 means to disable compression.  A more flexible way to control compression is to exclude certain file extensions with the no-compress-ext config variable.  See the config command for details.

Raw Block Device, Partition, and Logical Volume Backups

If a block device name is used on the backup command line, all data stored on the block device will be stored.  The block device should either be unmounted or mounted read-only before taking the backup, or, with logical volumes, a snapshot must be taken and then the snapshot backed up.  To maximize dedup, a block size of 4K is good for most block device backups.  For very large devices, a block size of 1M or more may give a smaller backup because of the higher compression.  You have to experiment with your data to decide.

Example Backup

Backup of Fedora 12's /usr directory running in a Mac OSX VM:

# create the backup directory
[root@localhost /]# hb init -c /hbbackup

HashBackup build 485 (C) HashBackup, LLC
Backup directory: /hbbackup
Permissions set for owner access only
Created key file: /hbbackup/key.conf
Key file set to read-only
Setting include/exclude defaults: /hbbackup/inex.conf

VERY IMPORTANT: your backup is encrypted and can only be accessed with
the encryption key, stored in the file:
   /hbbackup/key.conf
You MUST make copies of this file and store them in a secure location,
separate from your computer and backup data.  If your hard drive fails,
you will need this key to restore your files.  If you setup any
remote destinations in dest.conf, that file should be copied too.

Backup directory initialized

# what does the encryption key look like?
[root@localhost /]# cat /hbbackup/key.conf
fb7b 3247 dc7b 6a63 8e75 7222 ab7d 8dc1 2689 ebdc f91b c515 e1b2 25a3 af58 a274

# run the 1st backup, which will be a full backup
[root@localhost /]# time hb backup -c /hbbackup -D1g -v1 /usr

HashBackup build 485 (C) HashBackup, LLC
Backup directory: /hbbackup
This is backup version: 0
Backing up: /usr
Writing archive 0.0
Writing archive 0.1

Time: 685.4s, 11m 25s
Stored: 173254
Excluded: 0
Bytes: 3515836864, 3.5 GB
Dupbytes: 213343393, 213 MB, 6%
Compression: 61%, 2.6:1
Space: 1.4 GB, 1.4 GB total
No errors

real    11m25.958s
user    4m7.223s
sys     1m49.423s

# release all filesystem buffers for incremental timing test
[root@localhost /]# echo 3 >/proc/sys/vm/drop_caches


# incremental backup of /usr
[root@localhost /]# time hb backup -c /hbbackup -D1g -v1 /usr

HashBackup build 485 (C) HashBackup, LLC
Backup directory: /hbbackup
This is backup version: 1
Backing up: /usr
Removing archive 1.0

Time: 53.5s
Stored: 0
Excluded: 0
Bytes: 0
No errors

real    0m55.590s
user    0m21.693s
sys     0m11.863s
Comments