Get
Restores files and directories from a backup.
$ hb get [-c backupdir] [-r version] [-v level]
[--no-ino] [--orig] [--delete] [-i] [-no-sd]
[--todev blockdev] [--cache cachedir[,lock]]
[--plan] [--no-local] [--no-mtime] [--splice]
pathnames
Relative paths and simple filenames are changed to full paths using the current directory.
By default, the get
command restores the last component of the
pathname to the current directory. For example, if /etc/rc.conf
is
restored, a new rc.conf
file is created in the current directory.
If /etc
is restored, a new etc
directory will be created in the
current directory. For extra safety, consider cd /tmp
before a
restore.
--orig
restores files to their original locations instead of the current directory.
Get
always asks before a file or directory is restored over an
existing file or directory. For safety, get
may sometimes refuse to
do a restore over existing data if it believes the restore may not be
what you really want. For example, it will not restore a file over an
existing non-empty directory with the same name. This can be avoided
by restoring to a different location, or removing or renaming the
existing file or directory before the restore.
When restoring over existing directories, get normally merges the
backup contents with the directory contents. If the existing files
match the backup files, they are left alone (unless --no-local
is
used). Otherwise, they are overwritten with the backup file. The
--delete
option deletes existing files and directories that are not
in the backup. This is similar to the rsync
--delete
option and can
be used to "sync" a directory to the backup contents.
The -r
option selects a specific version to restore. This will
restore files that are backed up at this version or an earlier version
as necessary. With -r1
, some of the files may have been backed up
in version zero and some backed up in version 1; the get
command will
automatically select the latest version for each file. If -r
is not
used, the get
command restores files from their most recent backup.
Using Local Files
When possible, get
will use local files to assist the restore. This
is sometimes called incremental restore. It looks in both the
original backup directory and the target restore directory for files
with the same name, mtime (modification time) and size to select matches.
This makes restarted restores very fast since data already downloaded
and restored does not have to be restored a 2nd time.
For a more strict verification, the --no-mtime
option can be used.
This computes a strong cryptographic hash of the local file and
ensures it matches the hash of the backup file before using the local
file for the restore.
To disable local data use the --no-local
option. This is a useful
option when doing test restores immediately following a backup to
verify remote backup data.
When local files have been modified, it is still possible to use some
local data combined with some remote data to restore files using the
--splice
option. This requires reading through local files and uses
more temp space in the backup directory, but can save significant
download bandwidth. During splicing, a strong file hash is computed
for the restored file and compared to the original backup file’s
strong hash to ensure it was correctly restored.
The -v
option controls how much output is displayed:
-v0
= don’t print filenames as they are restored
-v1
= print some basic headings only
-v2
= print filenames as they are restored
-v3
= print local filenames not used in the restore
-v4
= print local files used in the restore
The -i
option asks before starting the restore. This is useful for
comparing restore plans.
The --no-sd
option disables selective download, which is what
HashBackup uses to download parts of files. This option might speed
up large restores when cache-size-limit
is set because the planning
stage is faster and uses less RAM. More data is downloaded because
whole arc files are downloaded instead of just the pieces needed, but
fewer requests are issued to remote servers. This option might be
useful on high-bandwidth and/or high-latency networks, or when RAM is
limited.
Cache Directory
When the config option cache-size-limit
is set, some arc (backup
data) files are stored locally in the backup directory and some are on
remote destinations and have to be downloaded to the backup directory
during restore. This poses challenges:
-
the backup directory may be on a small disk, like an SSD, and might not have room to download the data needed to do a large restore.
-
during a restore, the backup directory has to be locked since the cache of arc files might be changing. This lock prevents backup during restore.
-
restores, specifically shard restores, cannot run concurrently using the same backup directory
The --cache
option solves these problems. It specifies a separate
directory used for storing downloaded data needed during the restore.
If the cache directory already exists and matches the backup directory
version, some downloaded data can be reused and the cache directory
will remain after the get command. If the cache directory doesn’t
exist when get starts, it will be created and then deleted after the
restore finishes. So to use the cache directory for several restores,
create it before using get.
The --cache
option has an optional modifier ,lock
that can be
added to request locking the main backup directory. This allows using
arc files already in the backup directory instead of downloading them
again. If ,lock
is not used, backup directory arc files can only be
used if the cache directory is on the same disk partition as the
backup directory, allowing hard linking.
To allow concurrent restores, consider adding $$ (process ID) to the cache pathname to make it unique for each restore.
Restore Integrity
During backup, HashBackup stores a SHA1 checksum for each block of data within a file and a separate SHA1 checksum for the entire file. During a restore, all of these checksums are verified to ensure that the data restored is identical to the data that was backed up.
HashBackup handles deleted files correctly: if a directory contains
files A and B in backup versions 0 and 1, and file B is deleted and a
backup occurs (version 2), then restoring the directory with -r0
or
-r1
will restore both A and B, but restoring with -r2
will only
restore file A. If you try to restore file B with -r2
, an error
message is displayed indicating that the file was deleted in version
2. You can restore the deleted file B by using the -r0
or -r1
options.
Unstable Inode Numbers and --no-ino
The --no-ino
option is used to backup and restore filesystems like
CIFS, UnRAID, and sshfs (FUSE in general) that do not have stable
inode numbers. In typical Unix filesystems, every unique file has a
unique inode number. But with these filesystems, the inode numbers
are re-used, so two different files can end up with the same inode
number. This is a problem because during restores, HashBackup hard
links files with the same inode number. To prevent incorrect
hardlinking, use the --no-ino
option for both backups and restores
of these types of filesystems. After the initial backup, HashBackup
will stop with an error if it sees that file inode numbers are not
stable and --no-ino
was not used.
Restore Plans and Limited Cache
When local backup directory disk space is limited, the
cache-size-limit
config option specifies how much backup data (arc
files) can remain in the backup directory. For restores with a
limited cache, HashBackup has to create a restore plan to decide what
data to download, the download order, and how long blocks have to stay
in the cache during the restore. For a very large restore with a lot
of small blocks, this restore planning can take some time. For
example, a 1TB VM image will contain over 240M blocks if saved at the
default 4K block size. Restore plans are saved in the backup directory
so that if the restore is restarted for some reason, the plan can be
reused.
To speed up restores in a disaster recovery situation, it is possible
to use the --plan
option to generate restore plans before they are
needed. This can be done for one or several different restores. It
is important to remember that each list of filenames to restore in one
get
command creates a unique restore plan. So if restore plans are
created for restoring files f1
and f2
separately, a different
restore plan is created for restoring f1
and f2
together in one
restore. If the backup database is changed in any way, such as
another backup occurs, all restore plans are invalidated. The typical
way to use --plan
would be as part of the daily backup procedure.
Restore plans are only saved if the --no-local option is
used, because checking for modified local files has to be done just
before every restore. The --plan option can be use with and without
--no-local to see how this affects the amount of data downloaded.
|
Raw Block Device Restores
When a raw block device is restored, the data can be stored in 3 locations:
-
--orig
= restore the data to its original block device (must be unmounted) -
--todev
= restore the data to a different, unmounted block device -
neither option means restore the data to a file in the current directory, using the device name as filename
Some systems (devmapper on Linux) use symbolic links to point to the actual block device. When you backup the symlink, the actual block device is included in the backup. For example: `
# hb backup -c hb /dev/mapper/ub1264-swap_1
HashBackup build 971 Copyright 2009-2013 HashBackup, LLC
Backup directory: /home/jim/hb
Adding symlink target to backup: /dev/mapper/ub1264-swap_1 -> /dev/mapper/../dm-1
This is backup version: 0
Dedup is not enabled
/dev/dm-1
/dev/mapper/ub1264-swap_1
If you restore the symlink, get
tries to do something reasonable.
If the symlink already exists but points to a different device, get
will restore to that device, but asks for confirmation. The safest
option is to use --todev
and restore directly to the block device
itself, /dev/dm-1
in this example, not the symlink. Then you know
exactly where your restored data is written.
Restore Performance
Good restore performance is important for any backup system. Below are a few different restores from an actual backup to give an idea of HashBackup’s restore performance. The backup and restores are on a 2.66GHz Intel Core 2 Duo (2010) Mac Mini OSX server with 2x750GB hard drives. The system has been backed up daily since October 2010 and is the HashBackup build server. There are 5 virtual machines running, with disk images from 3-10GB each, 6 other virtual machines, and the usual OSX files. One hard drive is the root drive, the other stores the backups. The root drive has around 81GB of data being backed up daily. Backup for this system uses ~350MB of RAM, 2/3rds of that for the dedup table. Restores typically use 100-150MB of RAM, independent of the file or backup size, or the size of the dedup table (it is not used during restore).
Important note: these are not restores from a traditional "full" backup, or a version 0 backup where everything was just saved. These restores are from 1746 incremental, deduped backups saved over a 6 year period.
Test 1: restore a complete copy of the OSX root drive from 1746 incremental backups for 6 years
sh-3.2# time hb get -c /backup / -v1
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /backup
Most recent backup version: 1746
Restoring most recent version
Using destinations in dest.conf
Begin restore
Restoring / to /test
Restore? yes
Restored / to /test
No errors
real 75m55.703s
user 51m20.498s
sys 7m57.109s
sh-3.2# du -ksc /test
81719612 /test
81719612 total
sh-3.2# find .|wc
578956 743762 57345665
In this test, 578,956 files and directories were restored, a total of 81.7GB in 76 minutes, averaging 17 MB/sec. The system was also running 11 virtual machines so was not idle during the test (it averages 80% idle), and most of these files were on the small side so there is more seek overhead. A similar test would be to do an initial backup of the root file system and then a complete restore. This could be expected to run faster because the backup data is not scattered over 1746 different backup versions (see test 4 below).
Test 2: restore a 2GB VM disk image with daily incremental backups for 5 years
sh-3.2# time hb get -c /backup /Users/.../Documents/Virtual\ Machines.localized/centos.vmwarevm/centos-s002.vmdk
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /backup
Most recent backup version: 1746
Restoring most recent version
Using destinations in dest.conf
Restoring centos-s002.vmdk to /test
/test/centos-s002.vmdk
Restored /Users/.../Documents/Virtual Machines.localized/centos.vmwarevm/centos-s002.vmdk to /test/centos-s002.vmdk
No errors
real 3m26.150s
user 1m46.945s
sys 0m11.811s
sh-3.2# ls -l centos5532-s002.vmdk
-rw-------@ 1 jim staff 2086993920 Apr 8 02:13 centos5532-s002.vmdk
For this test, a 2GB virtual machine (VM) disk image file was restored. The file was originally saved August 2011 with daily incremental backups since then for the last 5 years. The VM stays running all the time so the disk image is backed up nearly every day. This was verified with the backup logs. This VM image is saved with a small 4K block size to maximize dedup, but that is also the most challenging to restore: the file has 509,520 backup blocks scattered over a 5 year backup period. HashBackup restored the file in 3 minutes, 26 seconds.
Test 3: restore a 498MB zip archive from a backup 3 years ago
# time hb get -c /backup /Users/.../tapes/prime.zip
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /backup
Most recent backup version: 1746
Restoring most recent version
Using destinations in dest.conf
Restoring prime.zip to /test
/test/prime.zip
Restored /Users/.../tapes/prime.zip to /test/prime.zip
No errors
real 0m14.044s
user 0m13.496s
sys 0m1.732s
sh-3.2# ls -l prime.zip
-rw-r--r-- 1 jim staff 498219045 May 19 2013 prime.zip
The 498MB zip archive was restored in 14 seconds, from backup #715 on May 19, 2013 - at an average rate of 35.6 MB/sec. This file restored at a faster rate because it was contained in a single backup version and did not change after that, unlike the VM disk image.
Test 4: backup and restore the same drive as test #1 from an initial backup
sh-3.2# time hb backup -c /data/test -v1 -D1g /
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /Volumes/HD2/test
Copied HB program to /Volumes/HD2/test/hb#1496
This is backup version: 0
Dedup enabled, 0% of current, 0% of max
Backing up: /
Mount point contents skipped: /Network/Servers
Mount point contents skipped: /dev
Mount point contents skipped: /home
Mount point contents skipped: /net
Time: 5345.8s, 1h 29m 5s
Checked: 579014 paths, 86520028440 bytes, 86 GB
Saved: 578956 paths, 86326260319 bytes, 86 GB
Excluded: 32
Dupbytes: 20540224433, 20 GB, 23%
Compression: 49%, 2.0:1
Space: 43 GB, 43 GB total
No errors
real 89m6.195s
user 100m2.367s
sys 12m2.822s
sh-3.2# time hb get -c /data/test -v1 /
HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC
Backup directory: /data/test
Most recent backup version: 0
Restoring most recent version
Begin restore
Restoring / to /test
Restore? yes
Restored / to /test
No errors
real 63m49.108s
user 53m11.719s
sys 10m29.799s
This is like test #1, restoring 86GB in 578,956 files, but the restore is from a single-version backup, similar to restoring from a traditional "full" backup. The restore is faster, 64 minutes vs 76 minutes, because data is all stored in one version rather than the 1746 incremental versions in test #1. But considering the extreme backup space savings, test #1 still has good restore performance.
This initial backup used 43GB of backup space. The backup for test
#1, covering a 6-year period of daily backups for the same drive, uses
about 100GB of space. Backup #1 has a retention policy of -s30d12m
,
meaning "keep the last 30 days of backups, plus 1 monthly backup for
the year". So there are around 42 versions being retained. The
"forever incremental" strategy has greatly reduced the amount of
backup space required, even less than just 3 full backups would need.