Config
This describes the preview version of config regarding the
db-history-days option.
|
Sets and displays configuration options. A history of all options used for each backup is stored in the backup database.
$ hb config [-c backupdir] [-r version] [key] [value]
Without command line options, displays all current config options that will be used for the next backup.
Command Options
-r
displays the config options used for a specific version
key
the config option to display or change
value
the new value for a config option. If omitted, the option’s current value is displayed.
Examples
Display current configuration for the next backup:
$ hb config -c backupdir
Display the configuration used for backup version 5:
$ hb config -c backupdir -r5
Set arc-size-limit
to 1gb (used for the next backup):
$ hb config -c backupdir arc-size-limit 1gb
Display arc-size-limit
:
$ hb config -c backupdir arc-size-limit
Display arc-size-limit
used for backup version 5:
$ hb config -c backupdir -r5 arc-size-limit
Config changes take effect on the next backup. To send config options to remotes immediately, use:
$ hb dest -c backupdir sync
Config Options
admin-passphrase
An admin passphrase restricts certain actions unless the passphrase is entered. This option is used to:
-
protect the config data from changes
-
protect
dest.conf
data stored in the database (see Dest command) -
restrict commands with the
enable-commands
anddisable-commands
options.
Unlike all other config options, the current admin passphrase cannot be displayed: it is stored as a hash code so displaying it is impossible. Also unlike other options, a new admin passphrase cannot be listed on the command line. Instead, wait for the prompt to enter the new passphrase, then enter it twice:
$ hb config -c hb admin-passphrase
HashBackup #2677 Copyright 2009-2022 HashBackup, LLC
Backup directory: /Users/jim/hb
Current config version: 0
New admin passphrase? <= enter the passphrase; it won't echo
New admin passphrase again? <= enter it again
Admin passphrase set
$
After an admin passphrase is set, the environment variable
HB_ADMIN_PASSPHRASE
can be used to pass the admin passphrase to
HashBackup for automated operations. For example, to change the
passphrase on a backup that currently has one, and enter the new
passphrase on the keyboard:
$ HB_ADMIN_PASSPHRASE=oldp hb config -c backupdir admin-passphrase
To change the passphrase using only environment variables, without keyboard input:
$ HB_ADMIN_PASSPHRASE=oldp HB_NEW_ADMIN_PASSPHRASE=newp hb config -c backupdir admin-passphrase
Environment variables are useful for automation but may have security risks that could expose sensitive data to other locally running processes. |
arc-size-limit
Specifies the maximum size of archive files and hb.db.N
incremental
backup files. The default is 100mb
. The minimum size is 10K
,
used for internal testing. At least 2 x arc-size-limit
bytes of
free disk space are required in the local backup directory.
When arc-size-limit
is changed, the new limit is used on the next
backup but HashBackup does not resize existing arc files. Future
retain
and rm
operations will use the new limit when combining
older, small arc files into larger arc files.
HashBackup may go slightly over the limit specified, especially on a
multi-core system or with large block sizes. If a hard limit is
needed for some reason, specify 5-10MB less or 3-4 block sizes less,
whichever is greater, and run some experiments to make sure your hard
limit isn’t exceeded. When using large block sizes or saving very
large files with the auto
block size, arc-size-limit
may be be
exceeded even more than usual, eg, instead of 100MB arc files, 300MB
arc files may be created. For the most precise control of arc file
size, the backup option -p0
can be used to force single-threaded
backups, though this may also decrease backup performance.
For multi-TB backups with mostly large files that do not change often,
a large archive size may be more efficient. A practical limit is 4GB
because many online storage services do not allow file sizes over 5GB
without special handling. Also, a limit higher than 4GB doubles RAM
usage when creating restore plans (when cache-size-limit
is set)
because 64-bit file offsets are required. Be aware that some online
services have a maximum upload file size.
For VM image backups with a lot of dedup, a high rate of change,
and/or small block sizes, a smaller arc size may be more efficient
because it will be packed more often. A smaller arc size may also
reduce disk space cache requirements during a restore, if local disk
space is tight. This is only a concern when cache-size-limit
is >=
0 and the destination does not support selective downloads (retrieving
parts of arc files), so entire arc files are downloaded.
As user files are removed from older backups with the rm
or retain
commands, "holes" are created in arc files. When enough data is
removed, more than pack-percent-free
, archives are automatically
packed to remove empty space (see pack-*
options). Packing requires
a download, pack, upload cycle. If all data from an archive is
removed, the archive is deleted without a pack operation. This is why
smaller archive files can be more space efficient than larger files:
when data is removed, smaller arc files increase the probability that
entire arc files can be removed without a pack.
A disadvantage of very small arc files, besides creating many arc
files for large backups, is that very small arc files may defeat
download optimizations. For example, with 1MB arc files, the maximum
request size to remote storage services is limited to 1MB. A backup
of a 1GB file will create 1000 arc files and downloading these will
require 1000 x 1MB remote requests. If the storage service has high
latency (delay for each request) it can cause performance problems.
Using arc-size-limit
below 4MB is not recommended, and in general,
arc-size-limit
should be somewhat proportional to your backup size.
backup-linux-attrs
When True
, backup will save Linux file attributes, also called flags
in BSD Unix. On Linux, file attributes are set with the chattr
command and displayed with lsattr
. They are little used and poorly
implemented on Linux, requiring an open file descriptor and an ioctl
call. This can cause permission problems, especially in shared
hosting environments, so the default is False
.
File attributes are not the same as extended attributes, also called
xattrs. Extended attributes are always backed up if present.
Extended attributes are handled by the attr
, setfattr
, and
setfattr
commands on Linux, as well as the Linux ACL commands.
block-size
The variable block size, default is auto
. Can be set to 16K, 32K,
64K, 128K, 256K, 512K or 1M. The effective average block size is
usually 1.5 times larger. Smaller block sizes dedup better than
larger block sizes but also cause more block metadata to be stored in
the database and can make restores slower, so there is a trade-off.
Previously the variable block size was 32K, but auto
scales better
because it uses larger block sizes for larger files.
For files larger than 2GB, the auto
setting switches to fixed block
sizes. To backup large files >2GB with a variable block size, such as
for a large database SQL dump, the block size
must be set to a
specific size, or the backup option -Vn
can be used to force a
variable block size, or the block-size-ext
option can be set to
-V128K .sql
to force all .sql
files to be saved with a 128K
variable block size.
If this option is changed on an existing backup at rev r, files
changed after rev r will be saved with the new block size and cannot
dedup against files saved before rev r with a different block size.
A block size warning is displayed during backup when this happens.
Once files are saved with the new block size, dedup will occur on
subsequent backups when files change. This option cannot be used to
set a large fixed block size. To do that, use -B4M
for example on
the backup command line.
block-size-ext
Sets the backup block size for specific filename extensions
(suffixes), overriding backup’s -B
command line option and other
block size settings. For example, with this option set to '-B4M
mov,avi -V128K .sql -B16K ibd -B23K xyz'
, backup will use:
-
large fixed-size 4M blocks for movie and video files
-
variable 128K blocks for text dumps of SQL databases with the
.sql
extension -
fixed 16K blocks for an Innodb database file (the default page size)
-
and fixed 23K blocks for
.xyz
files.
Commas and periods are optional. The entire value must be quoted on
the config
command line since it contains spaces. Unlike most
config options, this option uses 1024 as a multiplier so 8K
means
8192 bytes.
cache-size-limit
Specifies the maximum amount of space to use in the local backup
directory to cache archive files. By default, this is -1
, meaning
keep local copies of all archive files. Setting cache-size-limit
>=
0 will restrict the number of archives kept locally. Remote
destinations must be setup in dest.conf
or this config option is
ignored.
The value can be set two ways:
-
a number less than 1000 indicates how many arc files to keep locally. The number is multiplied by
arc-size-limit
to get a space limit. -
a space limit can be set directly, for example,
10GB
A reasonable value for this limit is the average size of your
incremental backups. A reasonable minimum is the highest workers
setting for any destination plus 1. The default number of workers is
4, so the minimum would be 5, but a more reasonable limit would be 1GB
for 100MB arc files. If you plan to use the mount command
extensively, a larger cache will prevent multiple downloads of the
same data.
The cache size limit is more of a desired goal than a hard limit. The limit may often be exceeded by the size of 1 archive file, and may need to be greatly exceeded for restores to prevent downloading the same archive file more than once. The mount command respects the limit but might have to download the same data more than once.
If you can afford the disk space, keeping a copy of archives locally has many advantages:
-
backups won’t stall if the cache fills up before destinations can upload the data
-
mount
does not need to download data and can run concurrently with backups -
restores will not have to wait for lengthy downloads
-
archive pack operations are faster and less expensive because only uploads are required - no downloads
-
less remote storage space is used because pack operations occur more frequently
-
you have a redundant copy of your backup
Thanks to compression and dedup, total backup space is usually about 50%-100% the size of the original data, even when multiple versions are maintained.
Examples of cache size limits:
-
-1
means copies of all archives are kept in the local backup directory. This is the default setting. -
0
means no archives should be kept locally. During backup, up to 2 archives are kept to ensure overlap between backup and network transmission. After the backup completes, all local arc files that have been sent to all destinations are deleted. -
1-999
means to size the cache to hold this many full archives. If the archive size is 100MB and the limit is set to 10, the effective cache size would be 1gb. A guideline is to use workers (indest.conf
) + 1 as a minimum. -
10gb
specifies the cache size directly. Other suffixes can be used: mb, etc.
If cache-size-limit
was -1 (all arc files kept locally) and then is
changed to 10GB
for example, the next backup will trim the cache
down to 10GB after the backup completes. You can force this trim to
happen immediately by backing up a small file like /dev/null after
changing cache-size-limit
.
If there are multiple destinations configured in dest.conf
, an arc
file must be stored successfully on all destinations before the
local copy can be deleted. For configurations where 2 USB drives are
setup and 1 is always offsite, there must be enough local cache space
to hold all backups that occur before swapping drives.
copy-executable
When True
, copy the hb program to remote destinations. The default
is False
. The hb program is always copied to the local backup
directory as hb#b
, where b is the build number. If using HB only
occasionally, more as an archive, it is important to set this true so
that you will have a copy of HB stored with your archive.
db-check-integrity
Controls when a database integrity check occurs. The default is
selftest
. This is fine when backups are run and stored on
enterprise-class hardware: non-removable drives, ECC RAM, and
hardwired networks.
For extra safety, set this option to upload
on consumer-grade
hardware using removable drives, non-ECC RAM, and wireless networks
where there is a higher possibility of database damage. This will
verify database integrity before each upload to destinations,
preventing a damaged local database from overwriting a good version
stored remotely.
db-history-days
Keep incremental database backup files (hb.db.N
) for this many days
before the latest backup. The default is 30. Combined with recover
--check
, this allows recovering earlier versions of the main backup
database. Lower values tend to use less remote storage but create
larger incrementals after each backup. A value of 1 will create one
large incremental of the entire database, usually about 50% the size
of hb.db itself, but carries some risk since there are no historical
versions. Larger values use more remote storage but create smaller
hb.db.N files after each backup. The special value 0 creates the
smallest hb.db.N files but can only recover the latest version of
hb.db. Please note that it will take 2x db-history-days days for
hb.db.N storage space to reach equilibrium.
dedup-mem
The amount of RAM to be used for the dedup table, similar to the -D
backup option. The -D
backup option overrides this config option
for a single backup. See Dedup Info
for more detailed information about sizing the dedup table. The
default is 100MB.
The dedup table does not immediately use all memory specified; it starts small and doubles in size when necessary, up to this limit. The backup command shows how full the current dedup table is, and how full it is compared to the maximum table size. When the dedup table is full and cannot be expanded, backups will continue to work correctly. Dedup is still very effective even with a full table.
In a sharded backup with N shards, each shard has its own dedup table.
To use 1GB of RAM for a backup with 4 shards, set dedup-mem
to
250MB.
disable-commands
List of commands that should be disabled, separated by commas; all
other commands are enabled. If the admin-passphrase
option is set,
these commands ask for the passphrase and run only if it is correctly
entered. If there is no admin-passphrase
set, these commands refuse
to run. The upgrade
and init
commands cannot be disabled because
when these commands execute there is no database to check whether they
are enabled or disabled. To cancel this option, use the value
''
(two single quotes).
SECURITY NOTE: Disabling commands without admin-passphrase
set is a
poor configuration since commands can easily be re-enabled.
enable-commands
List of commands that should be enabled, separated by commas; all
other commands are disabled. An example would be to enable only the
backup command. Then it would not be possible to list, restore, or
remove files without the admin passphrase. To cancel this option, use
the value ''
(two single quotes).
SECURITY NOTE: Enabling commands without setting an
admin-passphrase
is a poor configuration since more commands can
easily be enabled.
no-backup-ext
List of file extensions, with or without dots, separated by spaces or commas. Files with these extensions will not be backed up.
no-backup-tag
List of filenames separated by spaces or commas. Directories
containing any of these files will not have their contents backed up,
other than the directory itself and the tag file(s). Typical values
are .nobackup
and CACHEDIR.TAG
Each entry in this list causes an
extra stat system call for every directory in the backup, so keep the
size of this list to a minimum.
no-compress-ext
List of file extensions of files that should not be compressed. Most common extensions like .zip, .bz1, and .gz are already programmed into HashBackup, but less common extensions can be added. Extensions may or may not have dots, and are separated by spaces or commas. HashBackup does not compress incompressible data within a file, so setting this option is not that important.
no-dedup-ext
List of file extensions of files with data that does not dedup well
with variable block sizes, for example, photos. If exact copies of
files are backed up, they will be still be deduped if dedup is
enabled. Files with these extensions are deduped with fixed-block
sizes, either automatically selected if block-size
is auto
, or 1MB
otherwise.
pack-age-days
The minimum age of an archive before it is packed. Some storage
services charge penalties for early removal of files, for example,
Amazon S3 Infrequent Access and Google Nearline. This setting
prevents packing archives until they have aged, avoiding the delete
penalty charge, but you are still paying more for storage of data that
might have been deleted earlier. All other conditions must be
satisfied before an archive is packed. The default for
pack-age-days
is 30. To disable this feature, set it to zero.
Higher numbers will reduce the number of pack operations but may
increase the backup storage requirements. If you do not pay delete
penalties and have low or zero download fees, a free download
allowance, or have cache-size-limit
set to -1 (local copy of arc
files), setting this to zero will cause more frequent packing to lower
storage costs, improve restore times, and lower restore cache
requirements.
This option also controls the frequency of packing operations. However, if a packing operation cannot complete because of a download or time limit, it will resume the next time rm or retain are run. If set to zero, rm and retain will try to pack archives on every run.
pack-bytes-free
The minimum free space in an archive before it is packed. The default
is 1MB. This setting prevents packing very small archives when data
is removed, ie, once an archive gets down to 1MB (the default
setting), it is not packed until it is empty, then is deleted. All
other conditions such as pack-percent-free
must also be satisifed
before an archive is packed. To disable this feature, set it to zero.
Higher numbers will increase the backup storage requirements and
reduce the number of pack operations.
pack-combine-min
The minimum size of an arc file before it is merged into a larger arc
file. The default is 1MB, meaning that when possible, arc files
smaller than 1MB will be merged together. For combining to occur,
packing must occur (see the other pack-*
config settings). For
technical reasons, some arc files cannot be merged into larger arc
files. Set this to 0 to disable combining. This may be cheaper
for storage services that have delete penalties.
pack-download-limit
The maximum amount of data that can be downloaded during a single run
of rm or retain for packing. The default is 950MB. To prevent any
downloading of remote arc files for packing, set this limit to 0 (but
see below). For "unlimited" downloading, set the limit to a high
value like 1TB. This limit only applies when cache-size-limit
is >=
0, meaning that not all arc files are stored locally; local arc files
are packed without a download.
It is recommended that pack-download-limit
not be set to zero.
When files are removed from the backup with rm and retain, "holes" are
created in arc files. Over time, this causes older backup data to be
stored inefficiently, making restores slower. Packing reorganizes arc
files into more efficient storage. To reduce the amount of packing,
raise pack-percent-free
to a high number like 95, meaning that 95%
of an arc file must be free before it is packed. This will prevent
nearly all downloading except tiny files smaller than
pack-combine-min
and very inefficient arc files with mostly empty
space.
If pack-download-limit
is set to a value smaller than the largest
archive in the backup, a warning is given that this archive cannot be
packed. Over time, if more space is freed within the archive, it will
eventually be packed or deleted.
pack-percent-free
When archive files have this percentage or more as free space, the
archive is packed to recover space. This is used by rm and retain
after they have deleted backup data. The default is 50. If
cache-size-limit
is >= 0 and you pay for downloaded data, consider
raising this percentage to avoid frequent packing or set
pack-download-limit
. Most storage services have high download
charges compared to their storage charge. If cache-size-limit
is -1
(the default), packing more often is fine because no download is
required. Higher numbers increase the backup storage requirements but
reduce the number of pack operations and reduce the number of
downloads when cache-size-limit
is >= 0.
setting pack-percent-free to a very low percentage and
pack-age-days to a low number can trigger excessive packing
operations.
|
remote-update
When set to normal
(the default), HB sends new data before deleting
old data, for example, during packing. This preserves the integrity
of remote backup areas. But if a remote disk becomes full it may be
necessary to delete the old data before sending new data. To enable
this behavior, set to unsafe
.
Another way to handle a full remote backup area is to delete recent
backup versions with clear -r
. This quickly removes entire versions
and their archives and makes room for a retain operation to remove
older data. The next backup will re-save the recent files that were
deleted if they are still present in the filesystem.
When set to unsafe , an interrupted HB command may leave
the remote backup area temporarily inconsistent. The next successful
HB command should correct it.
|
retain-extra-versions
Adds an extra tme period to each -s
interval as a retention safety
cushion, so -s7d4w
becomes -s8d5w
The default is True
. See the
retain
command for a better explanation.
shard-output-days
The number of days to keep shard output in the sout
subdirectory.
The default is 7 days. Set to 0 to keep all shard output.
simulated-backup
If set to true
before the first backup, no arc files are created by
the backup command. This allows modeling backup options such as -B
(blocks size) and -D
(dedup table size), even for very large
backups, without using a lot of disk space. Simulated backups also
run faster because there is less I/O. Incremental backups, rm
, and
retain
all work correctly, and the stats
command can be used to
view statistics showing the backup space that would be used by a real
backup. The pack-…
keywords control whether the simulated arc
files are packed when files are removed with rm
or retain
.
Daily backups work as expected with simulated backups, saving only modified files and deduping against previous backups.
Differences for simulated backups:
-
must be set before the initial backup
-
cannot be changed after the initial backup
-
no arc files are created (not a real backup)
-
no arc files are sent to destinations
-
selftest is limited to
-v2
or below -
mount works but cannot read data from files
-
get exits with an error message
-
recover will not try to download arc files