Destinations
HashBackup uses a local backup directory to store backup metadata
(file names, file sizes, etc) in an encrypted database, hb.db
. The
local backup directory is specified by -c
, or ~/hashbackup
is used
if -c
is omitted. The encryption key is stored in key.conf
in the
backup directory. These files are created by the init
command.
The backup
command updates the database and creates one or more
archive files in the backup directory, named arc.v.n
, where v
is
the backup number and n
is a counter starting at zero. The default
size for arc files is 100MB, controlled by config option
arc-size-limit
.
There are 3 basic storage setups for HashBackup:
-
backup data is kept only in the local backup directory
-
backup data is kept locally and mirrored on remote storage
-
backup is kept mostly on remote storage with a local cache
In the default configuration, backup data is kept only in the local backup directory. If that’s your goal, you’re done and don’t need to read on.
You may want to keep a copy of your backup locally, but not inside the
local backup directory. For example, your backup directory may be on
a local, fast SSD for good database performance, but you want backup
data written to a directory on an NFS server or to a spinning disk.
This requires setting up a Dir
destination with instructions below.
It’s setup like remote storage but accessed through an ordinary
directory.
Backup to Remote Storage
It’s a good idea to send backup data to one or more remote storage
destinations to protect against loss of the local backup directory
from theft or an on-site disaster. Remote backup storage is setup by
creating a dest.conf
file in the local backup directory. Backup
data is sent to all remote storage services listed in dest.conf
using multiple worker processes for each destination. Transfers occur
while the backup is running to minimize total backup time.
Keeping a Local Copy of Backup Data
When backup data is sent to remote storage, you have the option to keep a complete copy in the local backup directory, or only a partial cache. Keeping the backup data local makes it much easier for HashBackup to manage remote storage, minimizes your remote storage costs, eliminates stalls during backup because of slower remote transfers, and makes restores much quicker.
Keeping a Local Cache of Backup Data
If it is not feasible to keep a complete copy of your backup in the
local backup directory, HashBackup can also operate with a partial
cache of backup data in the local backup directory. One config
option, cache-size-limit
, controls the size of this local cache.
The default is -1, meaning to keep a copy of all backup data in the
local backup directory. Setting cache-size-limit
to 5GB will limit
the size of local backup data to 5GB. It is recommended to set
cache-size-limit
as high as reasonable, because keeping more backup
data locally allows HashBackup to better optimize remote storage costs
and prevent backup stalls if files cannot be transferred to remote
storage as fast as backup creates new arc files. The setting is
easily changed at any time, so this can be decided later.
The dest.conf
file
The dest.conf
text file describes a list of destinations, usually
offsite, to receive copies of the backup. The dest.conf
file is
created in the local backup directory (the -c
directory) with a text
editor, using these notes as a guide. The dest.conf
file is setup
the same way whether you plan to keep a complete local copy or a
partial cache.
IMPORTANT SECURITY PRECAUTIONS
-
Because the
key.conf
anddest.conf
files contain password and key information, they are never copied to any remote destination. Therefore, the HashBackup executable,key.conf
, anddest.conf
files should be copied to a safe place (or several safe places) in case your backup drive becomes inaccessible and you lose these critical files. -
When HB creates the
key.conf
file, it sets permissions to read-only for the owner, with no rights for everyone else. It is important fordest.conf
to also have restrictive permissions because it contains passwords necessary to access remote services. You can do this with the chown and chmod commands:
$ chown root dest.conf # or whatever id runs HashBackup
$ chmod 700 dest.conf
General Concepts
As backup runs, the backup files created (arc files) are copied to
every destination. You can specify more than one destination in
dest.conf
. Destinations should be listed in the order you want them
used for a restore. So for example, a local FTP server should be
listed before Amazon S3. The same type of destination can be
specified more than once, for example, two FTP servers could be
listed. Each destination must have a unique destname
keyword, as
this is how HashBackup tracks files on each destination. Creating two
destinations for the same physical server or remote storage service is
fine so long as the storage itself does not overlap (the Dir keyword
control this). HashBackup manages unique ID tags for each destination
to prevent accidental overwriting of backup storage.
Unavailable or Failing Destinations
If a destination is not available for some reason during a backup, for
example, a USB drive is not plugged in, an error is displayed and
files will not be copied there. The next time you do a backup and the
destination is available, any missing or updated files will be copied
to "catch up". Review the onfail
keyword in this situation. If you
have setup a limited cache with cache-size-limit
, a failed
destination may cause the local cache to fill up, which then causes
backup to halt.
Adding Destinations to an Existing Backup
If you add a new destination to dest.conf
, the next backup command
will copy all backup data to the new destination, including data from
previous backups. If not all backup data is stored locally, because
cache-size-limit
is set, HashBackup may have to download old backup
data from remote destinations to copy it to new destinations. The
backup is stalled until the remote-to-remote copy is finished. If
cache-size-limit
is -1 (all data kept in the backup directory), this
"catch up" synchronization will occur during the backup.
Removing a Destination
To remove a destination (you no longer want data there), first use this command to clear all files from the destination:
$ hb dest -c backupdir clear destname
Then immediately remove the destination’s info from the dest.conf
file, or add the off
keyword. If you run another HB command before
editing dest.conf
, HB may try to copy all the files back to the
destination you just cleared.
Creating dest.conf
To setup remote destinations, use a text editor to copy
example destinations to the dest.conf
file in your -c
directory.
Example destination setups are at the bottom of the page. Keep in
mind that HashBackup downloads files from destinations based on their
order in dest.conf
, with the first destination having the highest
download priority.
Conventions
The dest.conf
text file has lines with a keyword and value. The
keyword is case-insensitive. Comment lines begin with a # and are
ignored, as are blank lines. As explained below, some keywords are
common to all destinations, some keywords are unique to only certain
types of destinations, some keywords are required, and some are
optional.
Keywords For All Destinations
destname
This keyword begins a new destination. HB tracks destination contents using only this name. Because of this, it is possible to create a "seed" backup to a USB hard drive plugged in to your computer, take that drive to a remote site, and change the other keywords to switch from a Dir type (local directory) to FTP for example.
IMPORTANT NOTE: do not change destname
once you have
backed up files. If you do this with a limited cache
(cache-size-limit
is set >= 0), your backup will immediately
become inaccessible, and the next backup command will fail because it
tries to synchronize archives. Change the name back to what it was
when you made the backups. If you change destname
with
cache-size-limit
set to -1, which is the default, it will cause all
of your previous archives to be uploaded again during the next
backup.
off
Add this line to dest.conf
to disable a destination. To
enable it, remove this line or comment it out, like
#off
.
type
Specifies the HB driver used to access the destination. Every destination must have the type keyword. Use the links at the bottom of the page for details and examples, but read this entire page first or the examples won’t make sense. The types supported are:
-
b2 - Backblaze B2
-
dav - WebDAV
-
dir - a directory (could be local, remote, USB stick, etc)
-
do - Dreamhost Dream Objects (S3 compatible)
-
ftp - FTP server
-
ftps - FTP with TLS (SSL); userid, password & commands are encrypted
-
gmail - gmail email account (automatically sets server & port)
-
gs - Google Cloud Storage (S3 compatible)
-
imap - imap (email) server
-
rsync - rsync server
-
s3 - Amazon S3 & compatibles
-
shell - user-written driver, including Rclone
-
ssh - sftp server over ssh
workers
Specifies the number of worker processes for a destination. The default of 2 is usually fine, but you may want to specify 1 worker to decrease HB’s network load or decrease concurrency on the destination. Use more workers if you have a fast connection and want to transfer a lot of data quickly or have a high latency connection (high delay) and want increased performance.
Setting workers
too high is counterproductive so it’s recommended to
increase it gradually by 2 or 4 then measure your throughput again to
make sure it is actually beneficial.
Setting workers
to 1 is recommended:
-
when debug is enabled to prevent the confusion of mixing together multiple workers' debug output
-
to help prevent memory allocation problems in low memory environments.
-
to reduce seeking and improve performance for single spinning hard drive destinations
retry
Specifies the number of retries, the initial delay before a retry, and
the delay factor for each retry. If omitted, destinations will retry
8 times on errors (9 times altogether), delaying 5 seconds the first
time, then multiplying the delay by 2 for each retry. This is
equivalent to retry 8,5,2
and gives a total retry period of around
20 minutes.
The delay between retries is limited to 20 minutes, so retry 10
retries for a total of about an hour. Add 3 retries for every
additional hour: retry 13
would retry for 2 hours, etc. Up to 3
integers can be specified, with the defaults used for missing values.
Some destination types have an internal retry loop, usually only a
minute or two, that may increase the total retry
time.
debug
This keyword takes an integer value, with higher numbers usually
causing more debug output. It can be useful when a destination is
having problems, but should not be used in production. The special
value 99 means that an error in the destination will cause a traceback
rather than a retry. This can be used to track down the cause of a
difficult error. It’s advisable to set workers
to 1 when enabling
debug output, because if > 1, workers' debug output is interwoven and hard to
follow.
rate
Specifies the maximum upload rate (outgoing bandwidth) for each
worker, in bytes per second. The minimum rate is 1024, since lower
rates are probably typos. A suffix can be used, for example, 500k
or 500KB
. The KB and kb suffixes both mean "kilo*bytes* per
second"; to specify bits per second, multiply by 8. If the rate
keyword is not used (the default), there is no upload rate limit.
Setting workers
to 1 may be useful when limiting upload rates. If
this is not done, a rate of 500k means each of the n workers can
upload at this rate, so if they are all active, the max upload rate
would be n times 500k. If you have a low speed upload connection and
want your network to be usable during uploads, set the rate at 25% of
your max upload speed, run some tests, try 50% of your max,
etc.
A better alternative to using the rate keyword is to enable QOS (Quality Of Service) on your router. By telling your router your maximum upload bandwidth, it is able to allocate upload bandwidth fairly among all active connections. The advantage is that if HB is the only process using the network, it can upload at full speed. If other network connections become active, the router will make sure that each connections gets a fair share of the upload bandwidth.
maxsize (Obsolete after #3015)
The maxsize
keyword is obsolete and will not work after release
#3015. The release notes for #3015 explain how to migrate a backup
with split files. The information below is for reference only.
The maxsize
keyword puts a hard limit on the size of files uploaded
to a destination. Any file exceeding this limit is split into parts
and each part uploaded as a file. On retrieval, the parts are fetched
and reassembled to create the original file. The value can be an
integer, meaning bytes, or can be a number (with optional decimal
point) with a suffix like K, M, G, T, P, with optional B. So 1.5KB
means 1.5 times 1000, or 1500.
Ideally, the config option arc-size-limit
should be smaller than
maxsize
to avoid splitting arc files into parts. arc-size-limit
may need to be quite a bit smaller than maxsize
to avoid splitting,
because backup sometimes creates arc files larger than
arc-size-limit
. If the hard limit is 128MB for example,
arc-size-limit
should be set to something like 120MB to avoid
splitting arc files unnecessarily.
onfail
If this keyword is not present on a destination and it fails, the backup will continue, an error will be counted, and the exit status will be non-zero.
If onfail ignore
is used and a destination fails, the backup will
continue, no error will be counted, and the exit status is not
affected. This is useful with destinations used in rotation, for
example, two USB drives are used but only one is normally present
during the backup.
If onfail stop
is used and a destination fails, the backup will stop
immediately and the exit status will be non-zero. This can be used
when it is critical that the backup data is sent to the
destination.
randfail (testing only)
This keyword can be used to simulate remote failures. The value is an integer 0-100 representing the percentage of requests that should fail: 25 means 1 out of 4 requests will fail, 50 means 1 of 2 will fail, 75 means 3 of 4 will fail, 100 means every request will fail. Simulated failures do not generate remote traffic. A destination halts if all requests fail for one file.
Randfail
is for testing HB’s error recovery and should not be used
in normal operation.
randwait (testing only)
This keyword can be used to simulate remote delays. The value is an
integer representing the maximum delay in seconds for a request. A
message is printed with the actual delay. The delay occurs at the
beginning of the request; use
the rate
keyword to simulate evenly-distributed delays throughout a
request.
Randwait
is for testing HB and should not be used in normal
operation.
timeout
This keyword sets the timeout for a destination that isn’t responding. The default is 300 (seconds), or 5 minutes. Destinations can override this default. For example, the default timeout for the rsync destination is 3600 (1 hour), because extremely slow rsync NAS servers, like an NSLU2, may take a long time to do a checksum verification.
Destination-Specific Keywords
Each type of destination has keywords peculiar to that type. These are documented in the examples for each destination type, listed in the links below.
Sharded Backups
With sharded backups, independent HashBackup processes run in parallel
and divide work between themselves. They share a single dest.conf
file in the main backup directory, but must keep their storage areas
separate. To do this, {shard}
can be used in dest.conf and is
replaced by the shard number. A typical use of this is with the Dir
keyword in dest.conf, for
example:
Dir myhost/s{shard}
will save backup files in myhost/s1,
myhost/s2,
etc. {shard}
can be used anywhere in dest.conf, not just on the Dir
keyword.