Restores files and directories from a backup. $ hb get [-c backupdir] [-r version] [-v level] [--no-ino] [--orig] [--delete] [-i] [-no-sd] [--todev blockdev] [--cache cachedir[,lock]] [--plan] [--no-local] [--no-mtime] [--splice] pathnames Pathnames may be full paths starting with /, relative paths starting with ./, or simple filenames not containing a slash. Relative paths and simple filenames are changed to full paths using the current directory. By default, the get command restores the last component of the pathname to the current directory. For example, if you restore /etc/rc.conf, a new rc.conf file will be created in the current directory. If you restore /etc, then a new etc directory will be created in the current directory. For extra safety, you may want to cd to /tmp before doing a restore. --orig restores files to their original locations instead of the current directory. Get always asks before a file or directory is restored over an existing file or directory. For safety, get may sometimes refuse to do a restore over existing data if it believes the restore may not be what you really want. For example, it will not restore a file over an existing non-empty directory with the same name. This can be avoided by restoring to a different location, or removing or renaming the existing file or directory before the restore. When restoring over existing directories, get normally merges the backup contents with the directory contents. If the existing files match the backup files, they are left alone (unless --no-local is used). Otherwise, the are overwritten with the backup file. The --delete option deletes existing files and directories that are not in the backup. This is similar to rsync's --delete option and can be used to "sync" a directory to the backup contents. The -r option selects a specific version to restore. This will restore files that are backed up at this version or an earlier version as necessary. With -r1, some of the files may have been backed up in version zero and some backed up in version 1; the get command will automatically select the latest version for each file. If -r is not used, the get command restores files from their most recent backup. Using Local Files When possible, get will use local files to assist the restore. This is sometimes called incremental restore. It looks in both the original backup directory and the target restore directory for files with the same mtime (modification time) and size to select matches. This makes restarted restores very fast since data already downloaded and restored does not have to be restored a 2nd time. For a more strict verification, the --no-mtime option can be used. This computes a strong cryptographic hash of the local file and ensures it matches the hash of the backup file before using the local file for the restore. To disable local data use the --no-local option. This is a useful option when doing test restores immediately following a backup, to verify remote backup data. When local files have been modified, it is still possible to use some local data combined with some remote data to restore files using the --splice option. This requires reading through local files and uses more temp space in the backup directory, but can save significant download bandwidth. During splicing, a strong file hash is computed for the restored file and compared to the original backup file's strong hash to ensure it was correctly restored. The -v option controls how much output is displayed:
The -i option asks before starting the restore. This is useful for comparing restore plans. The --no-sd option disables selective download, which is what HashBackup uses to download parts of files. This option might speed up large restores when cache-size-limit is set. The planning stage is faster and uses less RAM. More data is downloaded because whole arc files are downloaded instead of just the pieces needed, but fewer requests are issued to remote servers. This option might be useful on high-bandwidth and/or high-latency networks, or when RAM is limited. When the config option cache-size-limit is set, some arc (backup data) files are stored locally in the backup directory and some are on remote destinations and have to be downloaded to the backup directory during restore. This poses challenges:
The --cache option solves these problems. It specifies a separate directory used for storing downloaded data needed during the restore. If the cache directory already exists and matches the backup directory version, some downloaded data can be reused and the cache directory will remain after the get command. If the cache directory doesn't exist when get starts, it will be created and then deleted when get finishes. So to use the cache directory for several restores, create it before using get. The --cache option has an optional modifier ,lock that can be added to request locking the main backup directory. This allows using arc files already in the backup directory instead of downloading them again. If ,lock is not used, backup directory arc files can only be reused if the cache directory is on the same disk as the backup directory, allowing hard linking. Restore Integrity HashBackup handles deleted files correctly: if a directory contains files A and B in backup versions 0 and 1, and file B is deleted and a backup occurs (version 2), then restoring the directory with -r0 or -r1 will restore both A and B, but restoring with -r2 will only restore file A. If you try to restore file B with -r2, an error message is displayed indicating that the file was deleted in version 2. You can restore the deleted file B by using -r0 or -r1. Unstable Inode Numbers and --no-ino The --no-ino option is used to backup and restore filesystems like CIFS, UnRAID, and sshfs (FUSE in general) that do not have stable inode numbers. In typical Unix filesystems, every unique file has a unique inode number. But with these filesystems, the inode numbers are re-used, so two different files can end up with the same inode number. This is a problem because during restores, HashBackup wants to hard link files with the same inode number. To prevent incorrect hardlinking, use the --no-ino option for both backups and restores of these types of filesystems. After the initial backup, HashBackup will stop with an error if it sees that file inode numbers are not stable and --no-ino was not used. Restore Plans and Limited Cache When local backup directory disk space is limited, the cache-size-limit config option specifies how much backup data (arc files) can remain in the backup directory. For restores with a limited cache, HashBackup has to create a restore plan to decide what data to download, the download order, and how long it has to stay in the cache during the restore. For a very large restore with a lot of small blocks, this restore planning can take some time. For example, a 1TB VM image will contain over 240M blocks if saved at the default 4K block size, Restore plans are saved in the backup directory so that if the restore is restarted for some reason, the plan can be reused. To speed up restores in a disaster recovery situation, it is possible to use the --plan option to generate restore plans before they are needed. This can be done for one or several different restores. It is important to remember that each list of filenames to restore in one get command creates a unique restore plan. So if restore plans are created for restoring f1 and f2 separately, a different restore plan is created to restore f1 and f2 in one restore. If the backup database is changed in any way, such as another backup occurs, all restore plans are invalidated. The typical way to use --plan would be as part of the daily backup procedure. IMPORTANT: restore plans are only saved if the --no-local option is used, because checking for modified local files has to be done just before every restore. The --plan option can be use with and without --no-local to see how this affects the amount of data downloaded. Raw Block Device Restores Restore PerformanceWhen a raw block device is restored, the data can be stored in 3 locations:
# hb backup -c hb /dev/mapper/ub1264-swap_1 HashBackup build 971 Copyright 2009-2013 HashBackup, LLC Backup directory: /home/jim/hb Adding symlink target to backup: /dev/mapper/ub1264-swap_1 -> /dev/mapper/../dm-1 This is backup version: 0 Dedup is not enabled /dev/dm-1 /dev/mapper/ub1264-swap_1 If you restore the symlink, get tries to do something reasonable. If the symlink already exists but points to a different device, get will restore to that device, but asks for confirmation. The safest option is to use --todev and restore directly to the block device itself, /dev/dm-1 in this example, not the symlink. Then you know exactly where your restored data is written. Important note: these are not restores from a traditional "full" backup, or a version 0 backup where everything was just saved. These restores are from 1746 incremental, deduped backups over a 6 year period. sh-3.2# time hb get -c /backup / -v1 HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /backup Most recent backup version: 1746 Restoring most recent version Using destinations in dest.conf Begin restore Restoring / to /test Restore? yes Restored / to /test No errors real 75m55.703s user 51m20.498s sys 7m57.109s sh-3.2# du -ksc /test 81719612 total sh-3.2# find .|wc 578956 743762 57345665 In this test, 578,956 files and directories were restored, a total of 81.7GB in 76 minutes, averaging 17 MB/sec. The system was also running 11 virtual machines so was not idle during the test (it averages 80% idle), and most of these files were on the small side so there is more seek overhead. A similar test would be to do an initial backup of the root file system and then a complete restore. This could be expected to run faster because the backup data is not scattered over 1746 different backup versions. Test 2: restore a 2GB VM disk image with daily incremental backups for 5 years sh-3.2# time hb get -c /backup /Users/.../Documents/Virtual\ Machines.localized/centos.vmwarevm/centos-s002.vmdk HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /backup Most recent backup version: 1746 Restoring most recent version Using destinations in dest.conf Restoring centos-s002.vmdk to /test /test/centos-s002.vmdk Restored /Users/.../Documents/Virtual Machines.localized/centos.vmwarevm/centos-s002.vmdk to /test/centos-s002.vmdk No errors real 3m26.150s user 1m46.945s sys 0m11.811s sh-3.2# ls -l centos5532-s002.vmdk -rw-------@ 1 jim staff 2086993920 Apr 8 02:13 centos5532-s002.vmdk For this test, a 2GB virtual machine (VM) disk image file was restored. The file was originally saved August 2011 with daily incremental backups since then for the last 5 years. The VM stays running all the time so the disk image is backed up nearly every day. This was verified with the backup logs. This VM image is saved with a small 4K block size to maximize dedup, but that is also the most challenging to restore: the file has 509,520 backup blocks scattered over a 5 year backup period. HashBackup restored the file in 3 minutes, 26 seconds. Test 3: restore a 498MB zip archive from a backup 3 years ago # time hb get -c /backup /Users/.../tapes/prime.zip HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /backup Most recent backup version: 1746 Restoring most recent version Using destinations in dest.conf Restoring prime.zip to /test /test/prime.zip Restored /Users/.../tapes/prime.zip to /test/prime.zip No errors real 0m14.044s user 0m13.496s sys 0m1.732s sh-3.2# ls -l prime.zip -rw-r--r-- 1 jim staff 498219045 May 19 2013 prime.zip The 498MB zip archive was restored in 14 seconds, from backup #715 on May 19, 2013 - at an average rate of 35.6 MB/sec. This file restored at a faster rate because it was contained in a single backup version and did not change after that, unlike the VM disk image. Test 4: backup and restore the same drive as test #1 sh-3.2# time hb backup -c /data/test -v1 -D1g / HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /Volumes/HD2/test Copied HB program to /Volumes/HD2/test/hb#1496 This is backup version: 0 Dedup enabled, 0% of current, 0% of max Backing up: / Mount point contents skipped: /Network/Servers Mount point contents skipped: /dev Mount point contents skipped: /home Mount point contents skipped: /net Time: 5345.8s, 1h 29m 5s Checked: 579014 paths, 86520028440 bytes, 86 GB Saved: 578956 paths, 86326260319 bytes, 86 GB Excluded: 32 Dupbytes: 20540224433, 20 GB, 23% Compression: 49%, 2.0:1 Space: 43 GB, 43 GB total No errors real 89m6.195s user 100m2.367s sys 12m2.822s sh-3.2# time hb get -c /data/test -v1 / HashBackup build #1496 Copyright 2009-2016 HashBackup, LLC Backup directory: /data/test Most recent backup version: 0 Restoring most recent version Begin restore Restoring / to /test Restore? yes Restored / to /test No errors real 63m49.108s user 53m11.719s sys 10m29.799s This is like test #1, restoring 86GB in 578,956 files, but the restore is from a single-version backup, similar to restoring from a traditional "full" backup. The restore is faster, 64 minutes vs 76 minutes, because data is all stored in one version rather than the 1746 incremental versions in test #1. But considering the extreme backup space savings, test #1 still has good restore performance. This initial backup used 43GB of backup space. The backup for test #1, covering a 6-year period of daily backups for the same drive, uses about 100GB of space. Backup #1 has a retention policy of -s30d12m, meaning "keep the last 30 days of backups, plus 1 monthly backup for the year". So there are around 42 versions being retained. The "forever incremental" strategy has greatly reduced the amount of backup space required, even less than just 3 full backups would need. |
Commands >