Destinations

HashBackup uses a local backup directory to store backup metadata (file names, file sizes, etc) in an encrypted database, hb.db. The local backup directory is specified by -c, or ~/hashbackup is used if -c is omitted. The encryption key is stored in key.conf in the backup directory. These files are created by the init command.

The backup command updates the database and creates one or more archive files in the backup directory, named arc.v.n, where v is the backup number and n is a counter starting at zero. The default size for arc files is 100MB, controlled by config option arc-size-limit.

There are 3 basic storage setups for HashBackup:

  1. backup data is kept only in the local backup directory

  2. backup data is kept locally and mirrored on remote storage

  3. backup is kept mostly on remote storage with a local cache

In the default configuration, backup data is kept only in the local backup directory. If that’s your goal, you’re done and don’t need to read on.

You may want to keep a copy of your backup locally, but not inside the local backup directory. For example, your backup directory may be on a local, fast SSD for good database performance, but you want backup data written to a directory on an NFS server or to a spinning disk. This requires setting up a Dir destination with instructions below. It’s setup like remote storage but accessed through an ordinary directory.

Backup to Remote Storage

It’s a good idea to send backup data to one or more remote storage destinations to protect against loss of the local backup directory from theft or an on-site disaster. Remote backup storage is setup by creating a dest.conf file in the local backup directory. Backup data is sent to all remote storage services listed in dest.conf using multiple worker processes for each destination. Transfers occur while the backup is running to minimize total backup time.

Keeping a Local Copy of Backup Data

When backup data is sent to remote storage, you have the option to keep a complete copy in the local backup directory, or only a partial cache. Keeping the backup data local makes it much easier for HashBackup to manage remote storage, minimizes your remote storage costs, eliminates stalls during backup because of slower remote transfers, and makes restores much quicker.

Keeping a Local Cache of Backup Data

If it is not feasible to keep a complete copy of your backup in the local backup directory, HashBackup can also operate with a partial cache of backup data in the local backup directory. One config option, cache-size-limit, controls the size of this local cache. The default is -1, meaning to keep a copy of all backup data in the local backup directory. Setting cache-size-limit to 5GB will limit the size of local backup data to 5GB. It is recommended to set cache-size-limit as high as reasonable, because keeping more backup data locally allows HashBackup to better optimize remote storage costs and prevent backup stalls if files cannot be transferred to remote storage as fast as backup creates new arc files. The setting is easily changed at any time, so this can be decided later.

The dest.conf file

The dest.conf text file describes a list of destinations, usually offsite, to receive copies of the backup. The dest.conf file is created in the local backup directory (the -c directory) with a text editor, using these notes as a guide. The dest.conf file is setup the same way whether you plan to keep a complete local copy or a partial cache.

IMPORTANT SECURITY PRECAUTIONS

  1. Because the key.conf and dest.conf files contain password and key information, they are never copied to any remote destination. Therefore, the HashBackup executable, key.conf, and dest.conf files should be copied to a safe place (or several safe places) in case your backup drive becomes inaccessible and you lose these critical files.

  2. When HB creates the key.conf file, it sets permissions to read-only for the owner, with no rights for everyone else. It is important for dest.conf to also have restrictive permissions because it contains passwords necessary to access remote services. You can do this with the chown and chmod commands:

$ chown root dest.conf # or whatever id runs HashBackup
$ chmod 700 dest.conf

General Concepts

As backup runs, the backup files created (arc files) are copied to every destination. You can specify more than one destination in dest.conf. Destinations should be listed in the order you want them used for a restore. So for example, a local FTP server should be listed before Amazon S3. The same type of destination can be specified more than once, for example, two FTP servers could be listed. Each destination must have a unique destname keyword, as this is how HashBackup tracks files on each destination. Creating two destinations for the same physical server or remote storage service is fine so long as the storage itself does not overlap (the Dir keyword control this). HashBackup manages unique ID tags for each destination to prevent accidental overwriting of backup storage.

Unavailable or Failing Destinations

If a destination is not available for some reason during a backup, for example, a USB drive is not plugged in, an error is displayed and files will not be copied there. The next time you do a backup and the destination is available, any missing or updated files will be copied to "catch up". Review the onfail keyword in this situation. If you have setup a limited cache with cache-size-limit, a failed destination may cause the local cache to fill up, which then causes backup to halt.

Adding Destinations to an Existing Backup

If you add a new destination to dest.conf, the next backup command will copy all backup data to the new destination, including data from previous backups. If not all backup data is stored locally, because cache-size-limit is set, HashBackup may have to download old backup data from remote destinations to copy it to new destinations. The backup is stalled until the remote-to-remote copy is finished. If cache-size-limit is -1 (all data kept in the backup directory), this "catch up" synchronization will occur during the backup.

Removing a Destination

To remove a destination (you no longer want data there), first use this command to clear all files from the destination:

$ hb dest -c backupdir clear destname

Then immediately remove the destination’s info from the dest.conf file, or add the off keyword. If you run another HB command before editing dest.conf, HB may try to copy all the files back to the destination you just cleared.

Creating dest.conf

To setup remote destinations, use a text editor to copy example destinations to the dest.conf file in your -c directory. Example destination setups are at the bottom of the page. Keep in mind that HashBackup downloads files from destinations based on their order in dest.conf, with the first destination having the highest download priority.

Conventions

The dest.conf text file has lines with a keyword and value. The keyword is case-insensitive. Comment lines begin with a # and are ignored, as are blank lines. As explained below, some keywords are common to all destinations, some keywords are unique to only certain types of destinations, some keywords are required, and some are optional.

Keywords For All Destinations

destname

This keyword begins a new destination. HB tracks destination contents using only this name. Because of this, it is possible to create a "seed" backup to a USB hard drive plugged in to your computer, take that drive to a remote site, and change the other keywords to switch from a Dir type (local directory) to FTP for example.

IMPORTANT NOTE: do not change destname once you have backed up files. If you do this with a limited cache (cache-size-limit is set >= 0), your backup will immediately become inaccessible, and the next backup command will fail because it tries to synchronize archives. Change the name back to what it was when you made the backups. If you change destname with cache-size-limit set to -1, which is the default, it will cause all of your previous archives to be uploaded again during the next backup.

off

Add this line to dest.conf to disable a destination. To enable it, remove this line or comment it out, like #off.

type

Specifies the HB driver used to access the destination. Every destination must have the type keyword. Use the links at the bottom of the page for details and examples, but read this entire page first or the examples won’t make sense. The types supported are:

  1. b2 - Backblaze B2

  2. dav - WebDAV

  3. dir - a directory (could be local, remote, USB stick, etc)

  4. do - Dreamhost Dream Objects (S3 compatible)

  5. ftp - FTP server

  6. ftps - FTP with TLS (SSL); userid, password & commands are encrypted

  7. gmail - gmail email account (automatically sets server & port)

  8. gs - Google Cloud Storage (S3 compatible)

  9. imap - imap (email) server

  10. rsync - rsync server

  11. s3 - Amazon S3 & compatibles

  12. shell - user-written driver, including Rclone

  13. ssh - sftp server over ssh

workers

Specifies the number of worker processes for a destination. The default of 2 is usually fine, but you may want to specify 1 worker to decrease HB’s network load or decrease concurrency on the destination. Use more workers if you have a fast connection and want to transfer a lot of data quickly or have a high latency connection (high delay) and want increased performance.

Setting workers too high is counterproductive so it’s recommended to increase it gradually by 2 or 4 then measure your throughput again to make sure it is actually beneficial.

Setting workers to 1 is recommended:

  • when debug is enabled to prevent the confusion of mixing together multiple workers' debug output

  • to help prevent memory allocation problems in low memory environments.

  • to reduce seeking and improve performance for single spinning hard drive destinations

retry

Specifies the number of retries, the initial delay before a retry, and the delay factor for each retry. If omitted, destinations will retry 8 times on errors (9 times altogether), delaying 5 seconds the first time, then multiplying the delay by 2 for each retry. This is equivalent to retry 8,5,2 and gives a total retry period of around 20 minutes.

The delay between retries is limited to 20 minutes, so retry 10 retries for a total of about an hour. Add 3 retries for every additional hour: retry 13 would retry for 2 hours, etc. Up to 3 integers can be specified, with the defaults used for missing values. Some destination types have an internal retry loop, usually only a minute or two, that may increase the total retry time.

debug

This keyword takes an integer value, with higher numbers usually causing more debug output. It can be useful when a destination is having problems, but should not be used in production. The special value 99 means that an error in the destination will cause a traceback rather than a retry. This can be used to track down the cause of a difficult error. It’s advisable to set workers to 1 when enabling debug output, because if > 1, workers' debug output is interwoven and hard to follow.

rate

Specifies the maximum upload rate (outgoing bandwidth) for each worker, in bytes per second. The minimum rate is 1024, since lower rates are probably typos. A suffix can be used, for example, 500k or 500KB. The KB and kb suffixes both mean "kilo*bytes* per second"; to specify bits per second, multiply by 8. If the rate keyword is not used (the default), there is no upload rate limit. Setting workers to 1 may be useful when limiting upload rates. If this is not done, a rate of 500k means each of the n workers can upload at this rate, so if they are all active, the max upload rate would be n times 500k. If you have a low speed upload connection and want your network to be usable during uploads, set the rate at 25% of your max upload speed, run some tests, try 50% of your max, etc.

A better alternative to using the rate keyword is to enable QOS (Quality Of Service) on your router. By telling your router your maximum upload bandwidth, it is able to allocate upload bandwidth fairly among all active connections. The advantage is that if HB is the only process using the network, it can upload at full speed. If other network connections become active, the router will make sure that each connections gets a fair share of the upload bandwidth.

maxsize (Obsolete after #3015)

The maxsize keyword is obsolete and will not work after release #3015. The release notes for #3015 explain how to migrate a backup with split files. The information below is for reference only.

The maxsize keyword puts a hard limit on the size of files uploaded to a destination. Any file exceeding this limit is split into parts and each part uploaded as a file. On retrieval, the parts are fetched and reassembled to create the original file. The value can be an integer, meaning bytes, or can be a number (with optional decimal point) with a suffix like K, M, G, T, P, with optional B. So 1.5KB means 1.5 times 1000, or 1500.

Ideally, the config option arc-size-limit should be smaller than maxsize to avoid splitting arc files into parts. arc-size-limit may need to be quite a bit smaller than maxsize to avoid splitting, because backup sometimes creates arc files larger than arc-size-limit. If the hard limit is 128MB for example, arc-size-limit should be set to something like 120MB to avoid splitting arc files unnecessarily.

onfail

If this keyword is not present on a destination and it fails, the backup will continue, an error will be counted, and the exit status will be non-zero.

If onfail ignore is used and a destination fails, the backup will continue, no error will be counted, and the exit status is not affected. This is useful with destinations used in rotation, for example, two USB drives are used but only one is normally present during the backup.

If onfail stop is used and a destination fails, the backup will stop immediately and the exit status will be non-zero. This can be used when it is critical that the backup data is sent to the destination.

randfail (testing only)

This keyword can be used to simulate remote failures. The value is an integer 0-100 representing the percentage of requests that should fail: 25 means 1 out of 4 requests will fail, 50 means 1 of 2 will fail, 75 means 3 of 4 will fail, 100 means every request will fail. Simulated failures do not generate remote traffic. A destination halts if all requests fail for one file.

Randfail is for testing HB’s error recovery and should not be used in normal operation.

randwait (testing only)

This keyword can be used to simulate remote delays. The value is an integer representing the maximum delay in seconds for a request. A message is printed with the actual delay. The delay occurs at the beginning of the request; use the rate keyword to simulate evenly-distributed delays throughout a request.

Randwait is for testing HB and should not be used in normal operation.

timeout

This keyword sets the timeout for a destination that isn’t responding. The default is 300 (seconds), or 5 minutes. Destinations can override this default. For example, the default timeout for the rsync destination is 3600 (1 hour), because extremely slow rsync NAS servers, like an NSLU2, may take a long time to do a checksum verification.

Destination-Specific Keywords

Each type of destination has keywords peculiar to that type. These are documented in the examples for each destination type, listed in the links below.

Sharded Backups

With sharded backups, independent HashBackup processes run in parallel and divide work between themselves. They share a single dest.conf file in the main backup directory, but must keep their storage areas separate. To do this, {shard} can be used in dest.conf and is replaced by the shard number. A typical use of this is with the Dir keyword in dest.conf, for example:

Dir myhost/s{shard}

will save backup files in myhost/s1, myhost/s2, etc. {shard} can be used anywhere in dest.conf, not just on the Dir keyword.