Commands‎ > ‎


Restores files and directories from a backup.

  $ hb get [-c backupdir] [-r version]
[-v level] 
          [--no-ino] [--orig] [--delete] 
          [--todev blockdev] [--cache cachedir[,lock]] 

Pathnames may be full paths starting with /, relative paths starting with ./, or simple filenames not containing a slash.  Relative paths and simple filenames are changed to full paths using the current directory.

By default, the get command restores the last component of the pathname to the current directory. For example, if you restore /etc/rc.conf, a new rc.conf file will be created in the current directory.  If you restore /etc, then a new etc directory will be created in the current directory.  For extra safety, you may want to cd to /tmp before doing a restore.

--orig restores files to their original locations instead of the current directory.

Get always asks before a file or directory is restored over an existing file or directory.  For safety, get may sometimes refuse to do a restore over existing data if it believes the restore may not be what you really want.  For example, it will not restore a file over an existing non-empty directory with the same name.  This can be avoided by restoring to a different location, or removing or renaming the existing file or directory before the restore.

When restoring over existing directories, get normally merges the backup contents with the directory contents, overwriting files that already exist.  The --delete option deletes existing files and directories that are not in the backup.  This is similar to rsync's --delete option and can be used to "sync" a directory to the backup contents.  However, the get command is not yet smart enough to not restore existing files that are already equal to the backup contents.

The -r option selects a specific version to restore.  This will restore files that are backed up at this version or an earlier version as necessary.  With -r1, some of the files may have been backed up in version zero and some backed up in version 1; the get command will automatically select the latest version for each file.  If -r is not used, the get command restores files from their most recent backup.

e -v option controls how much output is displayed:
  • -v0 = don't print filenames as they are restored
  • -v1 = print names of files as they are restored

Cache Directory

When --cache-size-limit is set, some arc (backup data) files are stored locally in the backup directory and some are on remote destinations and have to be downloaded to the backup directory during restore.  This poses challenges:
  • the backup directory may be on a small disk, like an SSD, and might not have room to download all of the data needed to do a large restore.
  • during a restore, the backup directory has to be locked since the cache of arc files might be changing.  This lock prevents backup and restore or two different restores from running simultaneously.
The --cache option solves these problems.  It specifies a separate directory used for storing downloaded data needed during the restore.  If the cache directory already exists and matches the backup directory version, some downloaded data can be reused and the cache directory will remain after the get command.  If the cache directory doesn't exist when get starts, it will be created and then deleted when get finishes.  So to use the cache directory for several restores, create it before using get.

The --cache option has an optional modifier ,lock that can be added to request locking the main backup directory.  This allows using arc files already in the backup directory instead of downloading them again.  If ,lock is not used, backup directory arc files can only be reused if the cache directory is on the same disk as the backup directory.

Restore Integrity

During backups, HashBackup stores SHA1 checksums for each block of data within a file and a separate SHA1 checksum for the entire file.  During a restore, both of these checksums are verified to ensure that the data restored is identical to the data that was backed up.

HashBackup handles deleted files correctly: if a directory contains files A and B in backup versions 0 and 1, and file B is deleted and a backup occurs (version 2), then restoring the directory with -r0 or -r1 will restore both A and B, but restoring with -r2 will only restore file A.  If you try to restore file B with -r2, an error message is displayed indicating that the file was deleted in version 2.  You can restore the deleted file B by using -r0 or -r1.

Unstable Inode Numbers and --no-ino

The --no-ino option is used to backup and restore filesystems like CIFS, UnRAID, and sshfs (FUSE in general) that do not have stable inode numbers.  In typical Unix filesystems, every unique file has a unique inode number.  But with these filesystems, the inode numbers are re-used, so two different files can end up with the same inode number.  This is a problem because during restores, HashBackup wants to hard link files with the same inode number.  To prevent incorrect hardlinking, use the --no-ino option for both backups and restores of these types of filesystems.

Raw Block Device Restores

When a raw block device is restored, the data can be stored in 3 locations:
  • --orig  = restore the data to its original block device (must be unmounted)
  • --todev = restore the data to a different, unmounted block device
  • neither option means restore the data to a file in the current directory, using the device name as filename
Some systems (devmapper on Linux) use symbolic links to point to the actual block device.  When you backup the symlink, the actual block device is included in the backup.  For example:

# hb backup -c hb /dev/mapper/ub1264-swap_1
HashBackup build 971 Copyright 2009-2013 HashBackup, LLC
Backup directory: /home/jim/hb
Adding symlink target to backup: /dev/mapper/ub1264-swap_1 -> /dev/mapper/../dm-1
This is backup version: 0
Dedup is not enabled

If you restore the symlink, get tries to do something reasonable.  If the symlink already exists but points to a different device, get will restore to that device (but asks for confirmation).  The safest option is to use --todev and restore the block device itself, /dev/dm-1 in this example, not the symlink.

Restore Performance

Good restore performance is important for any backup system.  Below are a few different restores from an actual backup to give an idea of HashBackup's restore performance.  The backup and restores a on a 2.66GHz Intel Core 2 Duo (2010) Mac Mini OSX  server with 2x750GB hard drives.  The system has been backed up daily since October 2010 and is the HashBackup build server.  There are 5 virtual machines running, with disk images from 3-10GB each, 6 other virtual machines, and the usual OSX files.  One hard drive is the root drive, the other stores the backups.  The root drive has around 81GB of data being backed up daily.  Backup for this system uses ~350MB of RAM, 2/3rds of that for the dedup table.  Restores typically use 100-150MB of RAM, independent of the file or backup size.

Important note: these are not restores from a traditional "full" backup, or a version 0 backup where everything was just saved.  These restores are from 1746 incremental, deduped backups over a 6 year period.

Test 1: restore a complete copy of the OSX root drive from 1746 incremental backups for 6 years

sh-3.2# time hb get -c /backup / -v1
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /backup
Most recent backup version: 1746
Restoring most recent version
Using destinations in dest.conf
Begin restore

Restoring / to /test
Restore? yes
Restored / to /test
No errors

real    75m55.703s
user    51m20.498s
sys      7m57.109s

sh-3.2# du -ksc /test
81719612    /test
81719612    total

sh-3.2# find .|wc
578956  743762 57345665

In this test, 578,956 files and directories were restored, a total of 81.7GB in 76 minutes, averaging 17 MB/sec.  The system was also running 11 virtual machines so was not idle during the test (it averages 80% idle), and most of these files were on the small side so there is more seek overhead.  A similar test would be to do an initial backup of the root file system and then a complete restore.  This could be expected to run faster because the backup data is not scattered over 1746 different backup versions.

Test 2: restore a 2GB VM disk image with daily incremental backups for 5 years

sh-3.2# time hb get -c /backup /Users/.../Documents/Virtual\ Machines.localized/centos.vmwarevm/centos-s002.vmdk
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /backup
Most recent backup version: 1746
Restoring most recent version
Using destinations in dest.conf

Restoring centos-s002.vmdk to /test
Restored /Users/.../Documents/Virtual Machines.localized/centos.vmwarevm/centos-s002.vmdk to /test/centos-s002.vmdk
No errors

real    3m26.150s
user    1m46.945s
sys     0m11.811s

sh-3.2# ls -l centos5532-s002.vmdk
-rw-------@ 1 jim  staff  2086993920 Apr  8 02:13 centos5532-s002.vmdk

For this test, a 2GB virtual machine (VM) disk image file was restored.  The file was originally saved August 2011 with daily incremental backups since then for the last 5 years.  The VM stays running all the time so the disk image is backed up nearly every day.  This was verified with the backup logs.  This VM image is saved with a small 4K block size to maximize dedup, but that is also the most challenging to restore: the file has 509,520 backup blocks scattered over a 5 year backup period.  HashBackup restored the file in 3 minutes, 26 seconds.

Test 3: restore a 498MB zip archive from a backup 3 years ago

# time hb get -c /backup /Users/.../tapes/                                                        
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /backup
Most recent backup version: 1746
Restoring most recent version
Using destinations in dest.conf

Restoring to /test
Restored /Users/.../tapes/ to /test/
No errors

real    0m14.044s
user    0m13.496s
sys     0m1.732s

sh-3.2# ls -l
-rw-r--r--  1 jim  staff  498219045 May 19  2013

The 498MB zip archive was restored in 14 seconds, from backup #715 on May 19, 2013 - at an average rate of 35.6 MB/sec.  This file restored at a faster rate because it was contained in a single backup version and did not change after that, unlike the VM disk image.

Test 4: backup and restore the same drive as test #1

sh-3.2# time hb backup -c /data/test -v1 -D1g /
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /Volumes/HD2/test
Copied HB program to /Volumes/HD2/test/hb#1496
This is backup version: 0
Dedup enabled, 0% of current, 0% of max
Backing up: /
Mount point contents skipped: /Network/Servers
Mount point contents skipped: /dev
Mount point contents skipped: /home
Mount point contents skipped: /net

Time: 5345.8s, 1h 29m 5s
Checked: 579014 paths, 86520028440 bytes, 86 GB
Saved: 578956 paths, 86326260319 bytes, 86 GB
Excluded: 32
Dupbytes: 20540224433, 20 GB, 23%
Compression: 49%, 2.0:1
Space: 43 GB, 43 GB total
No errors

real    89m6.195s
user    100m2.367s
sys    12m2.822s

sh-3.2# time hb get -c /data/test -v1 /
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /data/test
Most recent backup version: 0
Restoring most recent version
Begin restore

Restoring / to /test
Restore? yes
Restored / to /test
No errors

real    63m49.108s
user    53m11.719s
sys    10m29.799s

This is like test #1, restoring 86GB in 578,956 files, but the restore is from a single-version backup, similar to restoring from a traditional "full" backup.  The restore is faster, 64 minutes vs 76 minutes, because data is all stored in one version rather than the 1746 incremental versions in test #1.  But considering the extreme backup space savings, test #1 still has good restore performance.

This initial backup used 43GB of backup space.  The backup for test #1, covering a 6-year period of daily backups for the same drive, uses about 100GB of space.  Backup #1 has a retention policy of -s30d12m, meaning "keep the last 30 days of backups, plus 1 monthly backup for the year".  So there are around 42 versions being retained.  The "forever incremental" strategy has greatly reduced the amount of backup space required, even less than just 3 full backups would need.