Config

This describes the preview version of config regarding the db-history-days option.

Sets and displays configuration options. A history of all options used for each backup is stored in the backup database.

$ hb config [-c backupdir] [-r version] [key] [value]

Without command line options, displays all current config options that will be used for the next backup.

Command Options

-r displays the config options used for a specific version

key the config option to display or change

value the new value for a config option. If omitted, the option’s current value is displayed.

Examples

Display current configuration for the next backup:
$ hb config -c backupdir

Display the configuration used for backup version 5:
$ hb config -c backupdir -r5

Set arc-size-limit to 1gb (used for the next backup):
$ hb config -c backupdir arc-size-limit 1gb

Display arc-size-limit:
$ hb config -c backupdir arc-size-limit

Display arc-size-limit used for backup version 5:
$ hb config -c backupdir -r5 arc-size-limit

Config changes take effect on the next backup. To send config options to remotes immediately, use:
$ hb dest -c backupdir sync

Config Options

admin-passphrase

An admin passphrase restricts certain actions unless the passphrase is entered. This option is used to:

  • protect the config data from changes

  • protect dest.conf data stored in the database (see Dest command)

  • restrict commands with the enable-commands and disable-commands options.

Unlike all other config options, the current admin passphrase cannot be displayed: it is stored as a hash code so displaying it is impossible. Also unlike other options, a new admin passphrase cannot be listed on the command line. Instead, wait for the prompt to enter the new passphrase, then enter it twice:

$ hb config -c hb admin-passphrase
HashBackup #2677 Copyright 2009-2022 HashBackup, LLC
Backup directory: /Users/jim/hb
Current config version: 0
New admin passphrase?   <= enter the passphrase; it won't echo
New admin passphrase again?   <= enter it again
Admin passphrase set
$

After an admin passphrase is set, the environment variable HB_ADMIN_PASSPHRASE can be used to pass the admin passphrase to HashBackup for automated operations. For example, to change the passphrase on a backup that currently has one, and enter the new passphrase on the keyboard:

$ HB_ADMIN_PASSPHRASE=oldp hb config -c backupdir admin-passphrase

To change the passphrase using only environment variables, without keyboard input:

$ HB_ADMIN_PASSPHRASE=oldp HB_NEW_ADMIN_PASSPHRASE=newp hb config -c backupdir admin-passphrase
Environment variables are useful for automation but may have security risks that could expose sensitive data to other locally running processes.

arc-size-limit

Specifies the maximum size of archive files and hb.db.N incremental backup files. The default is 100mb. The minimum size is 10K, used for internal testing. At least 2 x arc-size-limit bytes of free disk space are required in the local backup directory.

When arc-size-limit is changed, the new limit is used on the next backup but HashBackup does not resize existing arc files. Future retain and rm operations will use the new limit when combining older, small arc files into larger arc files.

HashBackup may go slightly over the limit specified, especially on a multi-core system or with large block sizes. If a hard limit is needed for some reason, specify 5-10MB less or 3-4 block sizes less, whichever is greater, and run some experiments to make sure your hard limit isn’t exceeded. When using large block sizes or saving very large files with the auto block size, arc-size-limit may be be exceeded even more than usual, eg, instead of 100MB arc files, 300MB arc files may be created. For the most precise control of arc file size, the backup option -p0 can be used to force single-threaded backups, though this may also decrease backup performance.

For multi-TB backups with mostly large files that do not change often, a large archive size may be more efficient. A practical limit is 4GB because many online storage services do not allow file sizes over 5GB without special handling. Also, a limit higher than 4GB doubles RAM usage when creating restore plans (when cache-size-limit is set) because 64-bit file offsets are required. Be aware that some online services have a maximum upload file size.

For VM image backups with a lot of dedup, a high rate of change, and/or small block sizes, a smaller arc size may be more efficient because it will be packed more often. A smaller arc size may also reduce disk space cache requirements during a restore, if local disk space is tight. This is only a concern when cache-size-limit is >= 0 and the destination does not support selective downloads (retrieving parts of arc files), so entire arc files are downloaded.

As user files are removed from older backups with the rm or retain commands, "holes" are created in arc files. When enough data is removed, more than pack-percent-free, archives are automatically packed to remove empty space (see pack-* options). Packing requires a download, pack, upload cycle. If all data from an archive is removed, the archive is deleted without a pack operation. This is why smaller archive files can be more space efficient than larger files: when data is removed, smaller arc files increase the probability that entire arc files can be removed without a pack.

A disadvantage of very small arc files, besides creating many arc files for large backups, is that very small arc files may defeat download optimizations. For example, with 1MB arc files, the maximum request size to remote storage services is limited to 1MB. A backup of a 1GB file will create 1000 arc files and downloading these will require 1000 x 1MB remote requests. If the storage service has high latency (delay for each request) it can cause performance problems. Using arc-size-limit below 4MB is not recommended, and in general, arc-size-limit should be somewhat proportional to your backup size.

backup-linux-attrs

When True, backup will save Linux file attributes, also called flags in BSD Unix. On Linux, file attributes are set with the chattr command and displayed with lsattr. They are little used and poorly implemented on Linux, requiring an open file descriptor and an ioctl call. This can cause permission problems, especially in shared hosting environments, so the default is False.

File attributes are not the same as extended attributes, also called xattrs. Extended attributes are always backed up if present. Extended attributes are handled by the attr, setfattr, and setfattr commands on Linux, as well as the Linux ACL commands.

block-size

The variable block size, default is auto. Can be set to 16K, 32K, 64K, 128K, 256K, 512K or 1M. The effective average block size is usually 1.5 times larger. Smaller block sizes dedup better than larger block sizes but also cause more block metadata to be stored in the database and can make restores slower, so there is a trade-off. Previously the variable block size was 32K, but auto scales better because it uses larger block sizes for larger files.

For files larger than 2GB, the auto setting switches to fixed block sizes. To backup large files >2GB with a variable block size, such as for a large database SQL dump, the block size must be set to a specific size, or the backup option -Vn can be used to force a variable block size, or the block-size-ext option can be set to -V128K .sql to force all .sql files to be saved with a 128K variable block size.

If this option is changed on an existing backup at rev r, files changed after rev r will be saved with the new block size and cannot dedup against files saved before rev r with a different block size. A block size warning is displayed during backup when this happens. Once files are saved with the new block size, dedup will occur on subsequent backups when files change. This option cannot be used to set a large fixed block size. To do that, use -B4M for example on the backup command line.

block-size-ext

Sets the backup block size for specific filename extensions (suffixes), overriding backup’s -B command line option and other block size settings. For example, with this option set to '-B4M mov,avi -V128K .sql -B16K ibd -B23K xyz', backup will use:

  • large fixed-size 4M blocks for movie and video files

  • variable 128K blocks for text dumps of SQL databases with the .sql extension

  • fixed 16K blocks for an Innodb database file (the default page size)

  • and fixed 23K blocks for .xyz files.

Commas and periods are optional. The entire value must be quoted on the config command line since it contains spaces. Unlike most config options, this option uses 1024 as a multiplier so 8K means 8192 bytes.

cache-size-limit

Specifies the maximum amount of space to use in the local backup directory to cache archive files. By default, this is -1, meaning keep local copies of all archive files. Setting cache-size-limit >= 0 will restrict the number of archives kept locally. Remote destinations must be setup in dest.conf or this config option is ignored.

The value can be set two ways:

  • a number less than 1000 indicates how many arc files to keep locally. The number is multiplied by arc-size-limit to get a space limit.

  • a space limit can be set directly, for example, 10GB

A reasonable value for this limit is the average size of your incremental backups. A reasonable minimum is the highest workers setting for any destination plus 1. The default number of workers is 4, so the minimum would be 5, but a more reasonable limit would be 1GB for 100MB arc files. If you plan to use the mount command extensively, a larger cache will prevent multiple downloads of the same data.

The cache size limit is more of a desired goal than a hard limit. The limit may often be exceeded by the size of 1 archive file, and may need to be greatly exceeded for restores to prevent downloading the same archive file more than once. The mount command respects the limit but might have to download the same data more than once.

If you can afford the disk space, keeping a copy of archives locally has many advantages:

  1. backups won’t stall if the cache fills up before destinations can upload the data

  2. mount does not need to download data and can run concurrently with backups

  3. restores will not have to wait for lengthy downloads

  4. archive pack operations are faster and less expensive because only uploads are required - no downloads

  5. less remote storage space is used because pack operations occur more frequently

  6. you have a redundant copy of your backup

Thanks to compression and dedup, total backup space is usually about 50%-100% the size of the original data, even when multiple versions are maintained.

Examples of cache size limits:

  • -1 means copies of all archives are kept in the local backup directory. This is the default setting.

  • 0 means no archives should be kept locally. During backup, up to 2 archives are kept to ensure overlap between backup and network transmission. After the backup completes, all local arc files that have been sent to all destinations are deleted.

  • 1-999 means to size the cache to hold this many full archives. If the archive size is 100MB and the limit is set to 10, the effective cache size would be 1gb. A guideline is to use workers (in dest.conf) + 1 as a minimum.

  • 10gb specifies the cache size directly. Other suffixes can be used: mb, etc.

If cache-size-limit was -1 (all arc files kept locally) and then is changed to 10GB for example, the next backup will trim the cache down to 10GB after the backup completes. You can force this trim to happen immediately by backing up a small file like /dev/null after changing cache-size-limit.

If there are multiple destinations configured in dest.conf, an arc file must be stored successfully on all destinations before the local copy can be deleted. For configurations where 2 USB drives are setup and 1 is always offsite, there must be enough local cache space to hold all backups that occur before swapping drives.

copy-executable

When True, copy the hb program to remote destinations. The default is False. The hb program is always copied to the local backup directory as hb#b, where b is the build number. If using HB only occasionally, more as an archive, it is important to set this true so that you will have a copy of HB stored with your archive.

db-check-integrity

Controls when a database integrity check occurs. The default is selftest. This is fine when backups are run and stored on enterprise-class hardware: non-removable drives, ECC RAM, and hardwired networks.

For extra safety, set this option to upload on consumer-grade hardware using removable drives, non-ECC RAM, and wireless networks where there is a higher possibility of database damage. This will verify database integrity before each upload to destinations, preventing a damaged local database from overwriting a good version stored remotely.

db-history-days

Keep incremental database backup files (hb.db.N) for this many days before the latest backup. The default is 30. Combined with recover --check, this allows recovering earlier versions of the main backup database. Lower values tend to use less remote storage but create larger incrementals after each backup. A value of 1 will create one large incremental of the entire database, usually about 50% the size of hb.db itself, but carries some risk since there are no historical versions. Larger values use more remote storage but create smaller hb.db.N files after each backup. The special value 0 creates the smallest hb.db.N files but can only recover the latest version of hb.db. Please note that it will take 2x db-history-days days for hb.db.N storage space to reach equilibrium.

dbid

A read-only config option that is unique for each backup database

dbrev

A read-only config option with the database revision level

dedup-mem

The amount of RAM to be used for the dedup table, similar to the -D backup option. The -D backup option overrides this config option for a single backup. See Dedup Info for more detailed information about sizing the dedup table. The default is 100MB.

The dedup table does not immediately use all memory specified; it starts small and doubles in size when necessary, up to this limit. The backup command shows how full the current dedup table is, and how full it is compared to the maximum table size. When the dedup table is full and cannot be expanded, backups will continue to work correctly. Dedup is still very effective even with a full table.

In a sharded backup with N shards, each shard has its own dedup table. To use 1GB of RAM for a backup with 4 shards, set dedup-mem to 250MB.

disable-commands

List of commands that should be disabled, separated by commas; all other commands are enabled. If the admin-passphrase option is set, these commands ask for the passphrase and run only if it is correctly entered. If there is no admin-passphrase set, these commands refuse to run. The upgrade and init commands cannot be disabled because when these commands execute there is no database to check whether they are enabled or disabled. To cancel this option, use the value '' (two single quotes).

SECURITY NOTE: Disabling commands without admin-passphrase set is a poor configuration since commands can easily be re-enabled.

enable-commands

List of commands that should be enabled, separated by commas; all other commands are disabled. An example would be to enable only the backup command. Then it would not be possible to list, restore, or remove files without the admin passphrase. To cancel this option, use the value '' (two single quotes).

SECURITY NOTE: Enabling commands without setting an admin-passphrase is a poor configuration since more commands can easily be enabled.

no-backup-ext

List of file extensions, with or without dots, separated by spaces or commas. Files with these extensions will not be backed up.

no-backup-tag

List of filenames separated by spaces or commas. Directories containing any of these files will not have their contents backed up, other than the directory itself and the tag file(s). Typical values are .nobackup and CACHEDIR.TAG Each entry in this list causes an extra stat system call for every directory in the backup, so keep the size of this list to a minimum.

no-compress-ext

List of file extensions of files that should not be compressed. Most common extensions like .zip, .bz1, and .gz are already programmed into HashBackup, but less common extensions can be added. Extensions may or may not have dots, and are separated by spaces or commas. HashBackup does not compress incompressible data within a file, so setting this option is not that important.

no-dedup-ext

List of file extensions of files with data that does not dedup well with variable block sizes, for example, photos. If exact copies of files are backed up, they will be still be deduped if dedup is enabled. Files with these extensions are deduped with fixed-block sizes, either automatically selected if block-size is auto, or 1MB otherwise.

pack-age-days

The minimum age of an archive before it is packed. Some storage services charge penalties for early removal of files, for example, Amazon S3 Infrequent Access and Google Nearline. This setting prevents packing archives until they have aged, avoiding the delete penalty charge, but you are still paying more for storage of data that might have been deleted earlier. All other conditions must be satisfied before an archive is packed. The default for pack-age-days is 30. To disable this feature, set it to zero. Higher numbers will reduce the number of pack operations but may increase the backup storage requirements. If you do not pay delete penalties and have low or zero download fees, a free download allowance, or have cache-size-limit set to -1 (local copy of arc files), setting this to zero will cause more frequent packing to lower storage costs, improve restore times, and lower restore cache requirements.

This option also controls the frequency of packing operations. However, if a packing operation cannot complete because of a download or time limit, it will resume the next time rm or retain are run. If set to zero, rm and retain will try to pack archives on every run.

pack-bytes-free

The minimum free space in an archive before it is packed. The default is 1MB. This setting prevents packing very small archives when data is removed, ie, once an archive gets down to 1MB (the default setting), it is not packed until it is empty, then is deleted. All other conditions such as pack-percent-free must also be satisifed before an archive is packed. To disable this feature, set it to zero. Higher numbers will increase the backup storage requirements and reduce the number of pack operations.

pack-combine-min

The minimum size of an arc file before it is merged into a larger arc file. The default is 1MB, meaning that when possible, arc files smaller than 1MB will be merged together. For combining to occur, packing must occur (see the other pack-* config settings). For technical reasons, some arc files cannot be merged into larger arc files. Set this to 0 to disable combining. This may be cheaper for storage services that have delete penalties.

pack-download-limit

The maximum amount of data that can be downloaded during a single run of rm or retain for packing. The default is 950MB. To prevent any downloading of remote arc files for packing, set this limit to 0 (but see below). For "unlimited" downloading, set the limit to a high value like 1TB. This limit only applies when cache-size-limit is >= 0, meaning that not all arc files are stored locally; local arc files are packed without a download.

It is recommended that pack-download-limit not be set to zero. When files are removed from the backup with rm and retain, "holes" are created in arc files. Over time, this causes older backup data to be stored inefficiently, making restores slower. Packing reorganizes arc files into more efficient storage. To reduce the amount of packing, raise pack-percent-free to a high number like 95, meaning that 95% of an arc file must be free before it is packed. This will prevent nearly all downloading except tiny files smaller than pack-combine-min and very inefficient arc files with mostly empty space.

If pack-download-limit is set to a value smaller than the largest archive in the backup, a warning is given that this archive cannot be packed. Over time, if more space is freed within the archive, it will eventually be packed or deleted.

pack-percent-free

When archive files have this percentage or more as free space, the archive is packed to recover space. This is used by rm and retain after they have deleted backup data. The default is 50. If cache-size-limit is >= 0 and you pay for downloaded data, consider raising this percentage to avoid frequent packing or set pack-download-limit. Most storage services have high download charges compared to their storage charge. If cache-size-limit is -1 (the default), packing more often is fine because no download is required. Higher numbers increase the backup storage requirements but reduce the number of pack operations and reduce the number of downloads when cache-size-limit is >= 0.

setting pack-percent-free to a very low percentage and pack-age-days to a low number can trigger excessive packing operations.

remote-update

When set to normal (the default), HB sends new data before deleting old data, for example, during packing. This preserves the integrity of remote backup areas. But if a remote disk becomes full it may be necessary to delete the old data before sending new data. To enable this behavior, set to unsafe.

Another way to handle a full remote backup area is to delete recent backup versions with clear -r. This quickly removes entire versions and their archives and makes room for a retain operation to remove older data. The next backup will re-save the recent files that were deleted if they are still present in the filesystem.

When set to unsafe, an interrupted HB command may leave the remote backup area temporarily inconsistent. The next successful HB command should correct it.

retain-extra-versions

Adds an extra tme period to each -s interval as a retention safety cushion, so -s7d4w becomes -s8d5w The default is True. See the retain command for a better explanation.

shard-output-days

The number of days to keep shard output in the sout subdirectory. The default is 7 days. Set to 0 to keep all shard output.

simulated-backup

If set to true before the first backup, no arc files are created by the backup command. This allows modeling backup options such as -B (blocks size) and -D (dedup table size), even for very large backups, without using a lot of disk space. Simulated backups also run faster because there is less I/O. Incremental backups, rm, and retain all work correctly, and the stats command can be used to view statistics showing the backup space that would be used by a real backup. The pack-…​ keywords control whether the simulated arc files are packed when files are removed with rm or retain.

Daily backups work as expected with simulated backups, saving only modified files and deduping against previous backups.

Differences for simulated backups:

  • must be set before the initial backup

  • cannot be changed after the initial backup

  • no arc files are created (not a real backup)

  • no arc files are sent to destinations

  • selftest is limited to -v2 or below

  • mount works but cannot read data from files

  • get exits with an error message

  • recover will not try to download arc files