Restores files and directories from a backup.
$ hb get [-c backupdir] [-r version] [-v level] [--no-ino] [--orig] [--delete] [-i] [-no-sd] [--todev blockdev] [--cache cachedir[,lock]] [--plan] [--no-local] [--no-mtime] [--splice] pathnames
Relative paths and simple filenames are changed to full paths using the current directory.
By default, the
get command restores the last component of the
pathname to the current directory. For example, if
restored, a new
rc.conf file is created in the current directory.
/etc is restored, a new
etc directory will be created in the
current directory. For extra safety, consider
cd /tmp before a
--orig restores files to their original locations instead of the current directory.
Get always asks before a file or directory is restored over an
existing file or directory. For safety,
get may sometimes refuse to
do a restore over existing data if it believes the restore may not be
what you really want. For example, it will not restore a file over an
existing non-empty directory with the same name. This can be avoided
by restoring to a different location, or removing or renaming the
existing file or directory before the restore.
When restoring over existing directories, get normally merges the
backup contents with the directory contents. If the existing files
match the backup files, they are left alone (unless
used). Otherwise, they are overwritten with the backup file. The
--delete option deletes existing files and directories that are not
in the backup. This is similar to the
--delete option and can
be used to "sync" a directory to the backup contents.
-r option selects a specific version to restore. This will
restore files that are backed up at this version or an earlier version
as necessary. With
-r1, some of the files may have been backed up
in version zero and some backed up in version 1; the
get command will
automatically select the latest version for each file. If
-r is not
get command restores files from their most recent backup.
Using Local Files
get will use local files to assist the restore. This
is sometimes called incremental restore. It looks in both the
original backup directory and the target restore directory for files
with the same name, mtime (modification time) and size to select matches.
This makes restarted restores very fast since data already downloaded
and restored does not have to be restored a 2nd time.
For a more strict verification, the
--no-mtime option can be used.
This computes a strong cryptographic hash of the local file and
ensures it matches the hash of the backup file before using the local
file for the restore.
To disable local data use the
--no-local option. This is a useful
option when doing test restores immediately following a backup to
verify remote backup data.
When local files have been modified, it is still possible to use some
local data combined with some remote data to restore files using the
--splice option. This requires reading through local files and uses
more temp space in the backup directory, but can save significant
download bandwidth. During splicing, a strong file hash is computed
for the restored file and compared to the original backup file’s
strong hash to ensure it was correctly restored.
-v option controls how much output is displayed:
-v0 = don’t print filenames as they are restored
-v1 = print some basic headings only
-v2 = print filenames as they are restored
-v3 = print local filenames not used in the restore
-v4 = print local files used in the restore
-i option asks before starting the restore. This is useful for
comparing restore plans.
--no-sd option disables selective download, which is what
HashBackup uses to download parts of files. This option might speed
up large restores when
cache-size-limit is set because the planning
stage is faster and uses less RAM. More data is downloaded because
whole arc files are downloaded instead of just the pieces needed, but
fewer requests are issued to remote servers. This option might be
useful on high-bandwidth and/or high-latency networks, or when RAM is
When the config option
cache-size-limit is set, some arc (backup
data) files are stored locally in the backup directory and some are on
remote destinations and have to be downloaded to the backup directory
during restore. This poses challenges:
the backup directory may be on a small disk, like an SSD, and might not have room to download the data needed to do a large restore.
during a restore, the backup directory has to be locked since the cache of arc files might be changing. This lock prevents backup during restore and keeps two different restores from running simultaneously.
--cache option solves these problems. It specifies a separate
directory used for storing downloaded data needed during the restore.
If the cache directory already exists and matches the backup directory
version, some downloaded data can be reused and the cache directory
will remain after the get command. If the cache directory doesn’t
exist when get starts, it will be created and then deleted when get
finishes. So to use the cache directory for several restores, create
it before using get.
--cache option has an optional modifier
,lock that can be
added to request locking the main backup directory. This allows using
arc files already in the backup directory instead of downloading them
,lock is not used, backup directory arc files can only be
reused if the cache directory is on the same disk as the backup
directory, allowing hard linking.
During backup, HashBackup stores a SHA1 checksum for each block of data within a file and a separate SHA1 checksum for the entire file. During a restore, all of these checksums are verified to ensure that the data restored is identical to the data that was backed up.
HashBackup handles deleted files correctly: if a directory contains
files A and B in backup versions 0 and 1, and file B is deleted and a
backup occurs (version 2), then restoring the directory with
-r1 will restore both A and B, but restoring with
-r2 will only
restore file A. If you try to restore file B with
-r2, an error
message is displayed indicating that the file was deleted in version
2. You can restore the deleted file B by using the
Unstable Inode Numbers and
--no-ino option is used to backup and restore filesystems like
CIFS, UnRAID, and sshfs (FUSE in general) that do not have stable
inode numbers. In typical Unix filesystems, every unique file has a
unique inode number. But with these filesystems, the inode numbers
are re-used, so two different files can end up with the same inode
number. This is a problem because during restores, HashBackup hard
links files with the same inode number. To prevent incorrect
hardlinking, use the
--no-ino option for both backups and restores
of these types of filesystems. After the initial backup, HashBackup
will stop with an error if it sees that file inode numbers are not
--no-ino was not used.
Restore Plans and Limited Cache
When local backup directory disk space is limited, the
cache-size-limit config option specifies how much backup data (arc
files) can remain in the backup directory. For restores with a
limited cache, HashBackup has to create a restore plan to decide what
data to download, the download order, and how long blocks have to stay
in the cache during the restore. For a very large restore with a lot
of small blocks, this restore planning can take some time. For
example, a 1TB VM image will contain over 240M blocks if saved at the
default 4K block size. Restore plans are saved in the backup directory
so that if the restore is restarted for some reason, the plan can be
To speed up restores in a disaster recovery situation, it is possible
to use the
--plan option to generate restore plans before they are
needed. This can be done for one or several different restores. It
is important to remember that each list of filenames to restore in one
get command creates a unique restore plan. So if restore plans are
created for restoring files
f2 separately, a different
restore plan is created for restoring
f2 together in one
restore. If the backup database is changed in any way, such as
another backup occurs, all restore plans are invalidated. The typical
way to use
--plan would be as part of the daily backup procedure.
Restore plans are only saved if the
Raw Block Device Restores
When a raw block device is restored, the data can be stored in 3 locations:
--orig= restore the data to its original block device (must be unmounted)
--todev= restore the data to a different, unmounted block device
neither option means restore the data to a file in the current directory, using the device name as filename
Some systems (devmapper on Linux) use symbolic links to point to the actual block device. When you backup the symlink, the actual block device is included in the backup. For example: `
# hb backup -c hb /dev/mapper/ub1264-swap_1 HashBackup build 971 Copyright 2009-2013 HashBackup, LLC Backup directory: /home/jim/hb Adding symlink target to backup: /dev/mapper/ub1264-swap_1 -> /dev/mapper/../dm-1 This is backup version: 0 Dedup is not enabled /dev/dm-1 /dev/mapper/ub1264-swap_1
If you restore the symlink,
get tries to do something reasonable.
If the symlink already exists but points to a different device,
will restore to that device, but asks for confirmation. The safest
option is to use
--todev and restore directly to the block device
/dev/dm-1 in this example, not the symlink. Then you know
exactly where your restored data is written.
Good restore performance is important for any backup system. Below are a few different restores from an actual backup to give an idea of HashBackup’s restore performance. The backup and restores are on a 2.66GHz Intel Core 2 Duo (2010) Mac Mini OSX server with 2x750GB hard drives. The system has been backed up daily since October 2010 and is the HashBackup build server. There are 5 virtual machines running, with disk images from 3-10GB each, 6 other virtual machines, and the usual OSX files. One hard drive is the root drive, the other stores the backups. The root drive has around 81GB of data being backed up daily. Backup for this system uses ~350MB of RAM, 2/3rds of that for the dedup table. Restores typically use 100-150MB of RAM, independent of the file or backup size, or the size of the dedup table (it is not used during restore).
Important note: these are not restores from a traditional "full" backup, or a version 0 backup where everything was just saved. These restores are from 1746 incremental, deduped backups saved over a 6 year period.
Test 1: restore a complete copy of the OSX root drive from 1746 incremental backups for 6 years
sh-3.2# time hb get -c /backup / -v1 HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /backup Most recent backup version: 1746 Restoring most recent version Using destinations in dest.conf Begin restore Restoring / to /test Restore? yes Restored / to /test No errors real 75m55.703s user 51m20.498s sys 7m57.109s sh-3.2# du -ksc /test 81719612 /test 81719612 total sh-3.2# find .|wc 578956 743762 57345665
In this test, 578,956 files and directories were restored, a total of 81.7GB in 76 minutes, averaging 17 MB/sec. The system was also running 11 virtual machines so was not idle during the test (it averages 80% idle), and most of these files were on the small side so there is more seek overhead. A similar test would be to do an initial backup of the root file system and then a complete restore. This could be expected to run faster because the backup data is not scattered over 1746 different backup versions (see test 4 below).
Test 2: restore a 2GB VM disk image with daily incremental backups for 5 years
sh-3.2# time hb get -c /backup /Users/.../Documents/Virtual\ Machines.localized/centos.vmwarevm/centos-s002.vmdk HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /backup Most recent backup version: 1746 Restoring most recent version Using destinations in dest.conf Restoring centos-s002.vmdk to /test /test/centos-s002.vmdk Restored /Users/.../Documents/Virtual Machines.localized/centos.vmwarevm/centos-s002.vmdk to /test/centos-s002.vmdk No errors real 3m26.150s user 1m46.945s sys 0m11.811s sh-3.2# ls -l centos5532-s002.vmdk -rw-------@ 1 jim staff 2086993920 Apr 8 02:13 centos5532-s002.vmdk
For this test, a 2GB virtual machine (VM) disk image file was restored. The file was originally saved August 2011 with daily incremental backups since then for the last 5 years. The VM stays running all the time so the disk image is backed up nearly every day. This was verified with the backup logs. This VM image is saved with a small 4K block size to maximize dedup, but that is also the most challenging to restore: the file has 509,520 backup blocks scattered over a 5 year backup period. HashBackup restored the file in 3 minutes, 26 seconds.
Test 3: restore a 498MB zip archive from a backup 3 years ago
# time hb get -c /backup /Users/.../tapes/prime.zip HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /backup Most recent backup version: 1746 Restoring most recent version Using destinations in dest.conf Restoring prime.zip to /test /test/prime.zip Restored /Users/.../tapes/prime.zip to /test/prime.zip No errors real 0m14.044s user 0m13.496s sys 0m1.732s sh-3.2# ls -l prime.zip -rw-r--r-- 1 jim staff 498219045 May 19 2013 prime.zip
The 498MB zip archive was restored in 14 seconds, from backup #715 on May 19, 2013 - at an average rate of 35.6 MB/sec. This file restored at a faster rate because it was contained in a single backup version and did not change after that, unlike the VM disk image.
Test 4: backup and restore the same drive as test #1 from an initial backup
sh-3.2# time hb backup -c /data/test -v1 -D1g / HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /Volumes/HD2/test Copied HB program to /Volumes/HD2/test/hb#1496 This is backup version: 0 Dedup enabled, 0% of current, 0% of max Backing up: / Mount point contents skipped: /Network/Servers Mount point contents skipped: /dev Mount point contents skipped: /home Mount point contents skipped: /net Time: 5345.8s, 1h 29m 5s Checked: 579014 paths, 86520028440 bytes, 86 GB Saved: 578956 paths, 86326260319 bytes, 86 GB Excluded: 32 Dupbytes: 20540224433, 20 GB, 23% Compression: 49%, 2.0:1 Space: 43 GB, 43 GB total No errors real 89m6.195s user 100m2.367s sys 12m2.822s sh-3.2# time hb get -c /data/test -v1 / HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /data/test Most recent backup version: 0 Restoring most recent version Begin restore Restoring / to /test Restore? yes Restored / to /test No errors real 63m49.108s user 53m11.719s sys 10m29.799s
This is like test #1, restoring 86GB in 578,956 files, but the restore is from a single-version backup, similar to restoring from a traditional "full" backup. The restore is faster, 64 minutes vs 76 minutes, because data is all stored in one version rather than the 1746 incremental versions in test #1. But considering the extreme backup space savings, test #1 still has good restore performance.
This initial backup used 43GB of backup space. The backup for test
#1, covering a 6-year period of daily backups for the same drive, uses
about 100GB of space. Backup #1 has a retention policy of
meaning "keep the last 30 days of backups, plus 1 monthly backup for
the year". So there are around 42 versions being retained. The
"forever incremental" strategy has greatly reduced the amount of
backup space required, even less than just 3 full backups would need.