Releases

#2679 Jan 22, 2022 preview

Preview notes:

  • Jan 22 #2679: NFS spans.tmp bug

  • Jan 20 #2678: log -x0 and -x1

Summary:

  • log: handle -x0 and -x1 better

  • get/mount: fix spans.tmp Directory not empty

Details:

  • log: the -x option controls how many lines of context to include around backup error messages, with a message of how many lines were skipped. Now, if only 1 line was skipped, log just includes it. So this:

    2022-01-19 Wed 21:44:00| /
    ... (1 lines)
    2022-01-19 Wed 21:44:04| /Users/jim/.sh_history

    is now summarized as:

    2022-01-19 Wed 21:44:00| /
    2022-01-19 Wed 21:44:00| /Users
    2022-01-19 Wed 21:44:04| /Users/jim/.sh_history

    This also fixes a bug where, if -x0 was used, meaning no context, log summaries did not show how many lines were skipped. With -x0, this:

    2022-01-19 Wed 21:44:04| Unable to list directory: Operation not permitted: /Users/jim/Library/Application Support/CallHistoryTransactions
    2022-01-19 Wed 21:44:05| Unable to list directory: Operation not permitted: /Users/jim/Library/Application Support/com.apple.TCC

    is now correctly summarized as:

    2022-01-19 Wed 21:44:04| Unable to list directory: Operation not permitted: /Users/jim/Library/Application Support/CallHistoryTransactions
    ... (37 lines)
    2022-01-19 Wed 21:44:05| Unable to list directory: Operation not permitted: /Users/jim/Library/Application Support/com.apple.TCC
  • get/mount: if the -c backup directory is on an NFS mount, then after get finishes and says "No errors", it could generate the traceback below. HashBackup left files open in the spans.tmp directory, tried to delete it, and while this works with a local filesystem, it fails on NFS mounts because instead of deleting open files, NFS renames them. Then when HB tries to delete the spans.tmp directory, there are hidden file still there, so the directory delete fails. Then when HB exits, NFS deletes the hidden files, so if you look after the failure, the directory is actually empty. Confusing! Now HB closes the files before the delete to prevent the traceback. Thanks Ware and Paul!

    Traceback (most recent call last):
      File "/opt/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
      File "/misc.py", line 1054, in rmx
      File "/misc.py", line 1043, in _rmtree
    OSError: [Errno 39] Directory not empty: '.../spans.tmp'

#2677 Jan 18, 2022

Summary:

  • count: fix traceback with permission errors

  • dest edit: bug fixes

  • backup: note for large extended attributes

  • export: traceback after export

  • rclone: recover error reading --backupdir

  • selftest: -v4 local archive migration

  • stats: add stats for db space utilization

  • stats: new --xmeta option

  • recover: more guidance, set cache-size-limit

  • compare: add -r option

  • B2 dest: handle bad bucket ID

  • dest clear: remove orphaned files

  • S3: disable multipart if necessary

  • check HB_UPGRADE_URL in installer

  • upgrade: --force option is obsolete

  • upgrade: better output handling for cron

  • recover: don’t allow -n with -o

  • NFS partial writes

Details:

  • count: if a directory is the first entry in its parent directory and a permission error occurs, it caused this traceback:

      File "/count.py", line 121, in dosubdirs
      File "/count.py", line 160, in walktree
    UnboundLocalError: local variable 'path' referenced before assignment

    Related to this, the pathname printed with the error message "error stating directory" was incorrect. Thanks Arthur!

  • dest edit: an empty file was causing "AttributeError: 'NoneType' object has no attribute 'strip'", and a non-empty file was not getting saved. Thanks Arthur!

  • backup: extended attributes, also called resource forks on OSX, are usually small bits of data attached to files. They are saved in the HB database, not in arc files, because they are usually small. However, they can be very large and this will cause the database to be unreasonably large. A notice is now displayed during backup for extended attributes or ACLs larger than 100KB. Thanks Arthur!

  • export: after exporting a backup, export could raise an error if the original backup had a passphrase protecting the key. The export worked fine; this error was related to updating the audit log after export finished, to set the status code.

    Traceback (most recent call last):
      File "/hb.py", line 284, in <module>
      File "/db.py", line 383, in opendb
    Unable to access HashBackup database: file is not a database
  • rclone: the recover command failed on rclone destinations with an error:

    Exception: shell(gcd): error reading --backupdir /hashbackup-data:
    [Errno 2] No such file or directory: '/hashbackup-data/hb.db'

    The rclone.py script was checking for the existence of hb.db in the backup directory, but it won’t exist yet for recover. The purpose of this check is to ensure that --backupdir really is an HB backup directory. Now instead of checking for hb.db, rclone.py checks for key.conf since that must be present in all situations. Since rlone.py is not automatically updated, users will need to download the new version from http://www.hashbackup.com/shells/rclone.py or edit their local copy. Thanks Ben!

  • selftest: when cache-size-limit is -1, all arc files should be kept locally. If cache-size-limit was previously >= 0 (arcs not all local) but is now -1 (arcs should be local), some arc files may not have local copies. Since selftest -v4 already downloads remote arc files to verify them, it will now save a copy if the local copy was missing. This allows sites to migrate arc files back locally with -v4 and optional --inc incremental testing.

  • stats: statistics have been added to show why the hb.db database can get "too big". This is almost always caused by large extended attributes (OSX) or backing up browser caches and other content-defined / hash pathnames. The new stats are:

     5,303 extended metadata entries
    278 MB database size
     88 MB space used by files (31%)
     47 MB space used by paths (17%)
    133 MB space used by blocks (47%)
    5.9 MB space used by extended metadata (2%)
    495 KB largest extended metadata entry
    226 MB database space utilization (81%)
  • stats: a new option --xmeta <size> shows files with extended metadata larger than <size> (usually extended attributes) to allow excluding or removing them from the backup if they are unimportant.

  • recover: adds more guidance about archive downloading before the recover starts based on -a and -n options. After recover, resets cache-size-limit if it contradicts with -a or -n.

  • compare: the -r option can be used to compare an older backup to the filesystem. This can be useful to see why an incremental was unexpectedly large. For example, if backup 939 was normal but backup 940 was very big, then hb compare -r939 can often be used to find out what changed in the live filesystem. For now this only works if the large files are still present in the filesystem. It would be nice to compare two backup versions without looking at the live filesystem. Thanks Krisztian!

  • B2 dest: by deleting and re-creating B2 buckets with the same name, renaming destinations in dest.conf, and using lots of setid commands, it’s possible to get a backup in a very weird state that causes tracebacks when files are deleted. After 10 retries, the destination shuts down:

    dest b2: error #1 of 9 in rm arc.1.0: b2(b2): http status 400 (Bucket 589c09e5d773b10967850313 does not exist) deleting fileid 4_z589c09e5d773b10967850313_f11867765b241ab9b_d20220110_m173350_c001_v0001097_t0047: test/arc.1.0
    Traceback (most recent call last):
      File "/basedest.py", line 90, in retry
      File "/b2dest.py", line 633, in rm
      File "/b2dest.py", line 311, in b2deletefile
    Exception: b2(b2): http status 400 (Bucket 589c09e5d773b10967850313 does not exist) deleting fileid 4_z589c09e5d773b10967850313_f11867765b241ab9b_d20220110_m173350_c001_v0001097_t0047: test/arc.1.0

    The problem is that files were stored in a B2 bucket, that bucket was deleted and re-created on the B2 web site (which assigns a new B2 bucket ID), and when HB tries to remove files it thinks are still there, it is sending delete requests for the orignal bucket ID that no longer exists. Now HB ignores this error as if the delete worked, allowing it to escape from this catch-22. Thanks Alexander!

  • dest clear: previously a destination had to be active before it could be cleared or a traceback would occur: "No active destination xyz in dest.conf". This can be a problem if the account is no longer accessible. Now, dest clear asks if you want to delete files anyway (unless --force is used) and removes them without any network I/O. Thanks Alexander!

  • S3: multipart get was added to #2641 using the multiprocess library. Some systems (FreeBSD 12, maybe others) generate a traceback during initialization and the destination is disabled:

    OSError: [Errno 78] Function not implemented

    Now HB tests for this during initialization and disables multipart operations with a message "multipart does not work on this system". Thanks Henrik!

  • the installer on the Download page checks the environment variable HB_UPGRADE_URL when fetching the real HashBackup binary. This enables private upgrade servers to also use the installer. The installer hashes shown on the Download page have been updated. Thanks Michael!

  • upgrade: previously, upgrade would ask before installing a new version. This made it necessary to have a --force option to avoid asking when running from a cron job. Since there is already a -n option to avoid installing the upgrade, the --force option is redundant and is now obsolete. Using --force will cause an extra message, so it should be removed from crontabs after upgrading to this release to avoid that.

  • upgrade: cron sends all job output, either stdout or stderr, to the job owner or MAILTO email address. Previously, hb upgrade wrote the version / copyright message to stdout but wrote the "You already have the latest version" to stderr. This is a problem because a daily upgrade cron job will send email every day.

    Now upgrade writes errors, release notes, and upgrade messages to stderr and writes "You already have the latest …​" to stdout. This allows redirecting stdout to /dev/null in crontab so only important mail is received.

  • recover: the -n (no arc download) and -o (overwrite arc files) options raise an error if used together.

#2641 Jan 1, 2022

Summary:

  • INCOMPATIBILITY NOTICE: recover -a

  • recover: new options -o and -a

  • SECURITY NOTICE: use dest load

  • new dest edit command

  • S3: enable SNI for S3 compatibles

  • upgrade: add troubleshooting code

  • S3: new partsize keyword in dest.conf

  • backup: suppress error if file disappears

  • log: new option -X to exclude lines

  • upgrade: secure private upgrade servers

  • config: HB_NEW_ADMIN_PASSPHRASE

  • dest: new "test" subcommand

  • mount: update FUSE info when not installed

  • S3: added multipart get for downloads

  • S3: accommodate MinIO peculiarity

  • backup: removed SIGTERM handler

  • S3: disable multipart transfers if only 1 worker

  • destinations: default for workers is now 4 vs 2

  • shards: only make 1 copy of HB

  • S3: require multipart for large file uploads

  • rm: succeed with a full local disk

  • dir destination: fix type error on disk full

  • destinations: always print a traceback on errors

  • selftest: checking 0% of backup

  • dest clear: don’t refuse if unused arc files

  • selftest -v4: don’t download missing arc file twice

  • ls: handle bracket wildcard patterns

  • backup: don’t halt on missing path

Details:

  • INCOMPATIBILITY NOTICE: previously recover -a meant to overwrite existing files. This option has been renamed to -o and a new -a option has been added.

  • recover: -o means overwrite existing files (was -a), and -a means download all arc files, ignoring cache-size-limit

  • SECURITY NOTICE: HashBackup allows specifying destination credentials in either the dest.conf text file or by loading this text file into the encrypted hb.db database. Because of ransomware, it is highly recommended that the dest.conf text file not be used to for remote credentials and that the dest load command is used to store these in hb.db instead. In a future release, the dest.conf text file will be forced into hb.db.

  • a new dest subcommand, edit, will edit the dest.conf file stored in the encrypted hb.db database without using the unload and load commands. It uses the EDITOR environment variable to choose an editor, or uses vi if EDITOR is not set. If there is no dest.conf loaded into the database, a new one can be created with the editor. Changes must be saved before quitting the editor.

  • S3: some S3 services (Storj) may have multiple hosts configured at the same DNS name. Connecting to these with SSL requires SNI (Server Name Indication), which is now supported.

  • upgrade: the retry loop is now 60 seconds vs 120 seconds. On a failure to download, a traceback is dumped to help diagnose problems.

  • S3: a new keyword "partsize" can be used to specify a fixed partsize for multipart S3 uploads and downloads. The default is 0, meaning that HB chooses a reasonable part size from 5MB (the smallest allowed) to 5GB (the largest allowed), based on the file size. When the new partsize keyword is used, HB uses this part size to determine the number of parts needed, then "levels" the part size across all parts. For example, when uploading a 990MB file with a partsize of 100M, HB will use 10 parts of 99M each. This option was added for the Storj network because it prefers part sizes that are a multiple of 64M (the Storj default segment size). The size can be specified as an integer number of bytes or with a suffix, like 100M or 100MB, but is always interpreted as MiB, ie, 100 * 1024 * 1024.

  • backup: if a file disappears after scanning a directory, backup suppressed this error if there was a previous backup and printed "pathname (deleted)", but for files not previously saved, it generated an error. Now it displays "pathname (deleted)" in both cases, suppressing the error. Thanks Michael!

  • log: a new -X option allows excluding lines when summarizing log files. All lines not starting with a slash (pathname) are normally included in log summaries. But some sites may want to exclude lines like "Copied arc.v.n to blah". Adding -X "Copied arc." would do that. The exclude list items are simple strings - no regular expressions. Thanks Michael!

  • upgrade: HashBackup uses RSA4096 signatures to verify that an upgraded version is authentic, so using SSL (secure http) to fetch upgrades is not necessary. However, SSL may be a policy requirement at some sites. To use SSL (https) with a private server, use:

    # SSL_CERT_FILE=/etc/ssl/cert.pem hb upgrade https://my.server/path

    If you aren’t sure where your root certificates are stored, use the pathname of the cacerts.crt file in a HashBackup backup directory. Thanks Michael!

  • config: previously the environment variable HB_ADMIN_PASSPHRASE could be set to avoid having to enter the admin passphrase. Now the admin passphrase can be set or changed with the environment variable HB_NEW_ADMIN_PASSPHRASE. To enter the new passphrase with the keyboard, use:

    $ HB_ADMIN_PASSPHRASE=oldp hb config -c backupdir admin-passphrase

    To enter the new passphrase with an environment variable, use:

    $ HB_ADMIN_PASSPHRASE=oldp HB_NEW_ADMIN_PASSPHRASE=newp hb config -c backupdir admin-passphrase

    WARNING: environment variables are useful for automation but may have security risks that could expose sensitive data to other locally running processes.

  • dest: a new "test" subcommand tests a single destination or all currently configured destinations. It performs 3 rounds of upload, download, and delete tests for many file sizes, displaying the performance of each and an average performance for each file size.

    Test all destinations:      $ hb dest -c backupdir test
    Test specific destinations: $ hb dest -c backupdir test <dest1> <dest2> ...
    Run specific tests:         $ hb dest -c backupdir test -t up del
    Test sizes 1K and 128M:     $ hb dest -c backupdir test -s 1k 128m
    Run 10 rounds instead of 3: $ hb dest -c backupdir test -r 10
    Delay 5 mins and repeat:    $ hb dest -c backupdir test -d 5m (or 5s or 5h)
    Repeat test 12 times:       $ hb dest -c backupdir test -n 12
  • S3: multipart get was added to S3 destinations. This scales very well with the workers and partsize keywords in dest.conf. In tests with the Storj S3 Gateway, 1GB download performance increased from 20 MB/s to over 200 MB/s using multipart gets, and Amazon S3 scaled up to over 300 MB/s with 16 threads. Multipart uploads and downloads are enabled by default unless multipart false is used in dest.conf. Thanks Dominick!

    NOTE: some S3-compatibles such as the Google Cloud Storage proxy do not support S3 multipart operations.

  • S3: if a file is copied to the MinIO object store with filesystem commands, everything works fine except that MinIO serves the file with an etag of 00000000000000000000000000000000-1. Instead of complaining and aborting the download, HB now ignores these etags. If there was a download error, it will be caught later during the restore with a block hash mismatch. Thanks Ian!

  • backup: in #1946, a SIGTERM handler was added to the backup command so that if the backup program was terminated, it finished the current file and then stopped cleanly. However, this does not play well with the new S3 multipart download feature, so the SIGTERM handler has been removed.

  • destinations: after testing with several object store services, the default number of workers has been changed from 2 to 4. This seems to be a sweet spot of increasing performance without adding too much overhead for managing threads.

  • shards: copy the HB program to the backup directory instead of each shard subdirectory

  • S3: check when uploading files larger than 5 GiB that multipart is enabled, and raise an error if not

  • rm: if the local backup directory disk is full, it’s important to be able to remove backup data to correct the disk full. Previously rm would fail trying to update the dedup table. Now it avoids this and can remove arc files to free up disk space. Thanks restic!

  • dir destination/backup: if all directories are full for a dir destination, it raised the error:
    TypeError: %d format: a number is required, not NoneType
    but now raises the correct error:
    d1(dir): file not copied: /Users/jim/hbrel/hb/arc.3.1

  • destinations: previously, a traceback was only printed after a destination exceeded all retries. Now a traceback is printed on every error. Instead of using "debug 99" to stop retries, use "retry 0" (debug 99 is now obsolete). Thanks Grant!

  • selftest: if a -v4 selftest used --inc, had a download limit specified with ",xxxMB", and would have exceeded the limit, selftest would say "Checking 0% of backup" instead of the actual percentage. Also, if only one version was being incrementally checked (-r used with --inc), the percentage checked is not of the whole backup but just of the version requested, so the message was changed to "Checking xx% of version r".

  • dest clear: when a destination has the only copy of a file, HB refuses to clear it. But it’s possible for an arc file to be unused, for example, it was removed while a destination was down. Now dest clear does not refuse on files that aren’t needed.

  • selftest: with -v4, if an arc file isn’t local and cache-size-limit is -1 (all arc files should be local), selftest would download the file twice. This situation can happen for example when switching from remote arc files (cache-size-limit >= 0) to local (the limit is -1) and some arc files are not yet migrated locally.

  • ls: use bracket wildcard patterns to match specific characters. [atJ] matches either a or t or J, [0-9] matches a single digit. This example now works whereas before it said "Path not in backup: /.fil[ef]":

    $ hb ls -c /hbbackup '/.fil[ef]'
    HashBackup #2640 Copyright 2009-2022 HashBackup, LLC
    Backup directory: /hbbackup
    Most recent backup version: 2952
    Showing most recent version, use -ad for all
    /.file
  • backup: when path /a/b was backed up and /a did not exist, backup would halt with a traceback before it got started:

    Traceback (most recent call last):
      File "/hb.py", line 129, in <module>
      File "/backup.py", line 3033, in main
      File "/misc.py", line 545, in fixpath
    OSError: [Errno 2] No such file or directory: '/a'

    Now it will do the backup and show an error during the backup:

    /a (deleted)
    Unable to backup: No such file or directory: /a/b

#2552 Jul 10, 2021

Summary:

  • new destination type "dirs" (multiple directories)

  • config: fix strange interactions

  • config: don’t ask for admin passphrase twice

  • retain/rm: always verify hashes during packing

  • retain/rm/selftest: improve bad block messages

Details:

  • a new variation of the dir destination with type "dirs" allows specifying multiple destination directories. This is useful with multi-drive external disk enclosures configured as JBOD (Just a Bunch Of Disks). It is possible to configure these units as one large RAID0 drive and use a "dir" destination, but that reduces reliability: if any disk fails, all data is typically lost. A RAID > 0 configuration is most reliable, but also reduces capacity. A JBOD configuration using the dirs destination with multiple directories is a middle ground: if a disk is lost, only the backup data on that drive is lost. This is an especially reasonable configuration if you have another copy of the backup data at another destination.

    The dirs destination requires a dirs keyword specifying multiple target directory paths, separated by colons. When reading from the destination, HB will check all directories until it finds the file it needs. When writing arc files, HB will check available space in each directory and copy to a directory with enough space to hold the file. When copying non-arc files, HB makes copies in every directory that has room, to increase the chances of a recover in case the local backup directory is lost and one or more JBOD disks are also lost.

    A new "copies" option specifies how many copies of arc file to create, to increase redundancy. The default is 1.

    A new "spread" option causes arc files to be distributed over all disks rather than filling up disk 1 before using disk 2.

  • config: it is possible to set enable/disable-commands without setting an admin-passphrase; only a warning is printed. But if enable-commands was set to 'backup' for example, then it was not possible to set an admin-passphrase because the config command was disabled. This created a backup where no config options could ever be changed again, which is a little strange and unintended.

    Now, the config command with the admin-passphrase subcommand is always allowed. If you really want a backup that is unchangeable like before, you can set an admin-passphrase to some random string that you then forget.

  • config: the admin passphrase was sometimes requested twice, depending on whether the config command was enabled or disabled.

  • retain/rm: during packing (removing empty space from an arc file), a block sometimes needs to be decrypted & decompressed, and this also verifies the block’s hash. If the block is bad, a hash mismatch error occurs and retain/rm abort. Sometimes these steps are not necessary and are skipped as a performance optimization. However, if a block is bad, skipping the hash verification allows the bad block to propagate without error during packing. The first time the bad block would be noticed would be either a -v4 selftest of the arc file or a restore needing the bad block. With this release, all block hashes are verified while packing an arc file and retain/rm will abort if any blocks are bad. This may slow down packing somewhat; if it becomes a problem, please send an email.

  • retain/rm/selftest: improve advice given when bad blocks are detected. For example, instead of: getblock: hash mismatch blockid 3 in arc.0.0 the message is now: getblock: hash mismatch blockid 3; run selftest --fix arc.0.0

#2546 Jun 2, 2021

Summary:

  • database upgrade to dbrev 34

  • backup: fix arc size mismatches

  • backup: inex.conf exclude bug

  • get: fix restore tracebacks

  • get: fix symlink restore on Linux

  • dir dest: bug fixes

  • selftest: remove extraneous size message

  • selftest: really delete blocks with --fix

  • selftest: expand notice message about -v4 --sample

  • selftest: explain database integrity errors

  • versions: fix traceback on interrupted backups

  • log: fix index error with hb log

Details:

  • this release does an automatic database upgrade to remove unused columns from the database

  • backup: if a destination failed a certain way, it could cause one or more remote archives for that version to appear corrupted after subsequent backups, and also generate selftest errors:

    Error: arc.V.N size mismatch on XXX destination: db says YYY, is ZZZ

    A selftest -v4 would give hash mismatch errors on the interrupted arc files. If the arc files were stored correctly either locally or on another destination, selftest -v4 --fix could fix the errors. But sometimes the only correct version was local. If cache-size-limit was also set, the correct arc files could be deleted when the cache filled up, leading to permanent corruption.

  • backup: a bug in exclude processing caused some files to be backed up that should have been excluded

  • get: restores could fail with a "list index out of range" error in certain circumstances. Thanks Ian!

  • get: could fail with a traceback: NameError: global name 'errmsg' is not defined when scanning local files to find a match for a file to be restored. From internal testing.

  • get: fix "Unable to set extended attributes" error when restoring symlinks on Linux, related to the order things are restored, if the symlink target had extended attributes. Thanks Ian!

  • dir dest: fix hang if dir destination file is shorter than expected when using selective download. From internal testing.

  • dir dest: retry selective download on I/O error. From internal testing.

  • selftest: the message "Note: arc.V.N is correct size on …​" was being displayed if an arc file had not yet been transferred to one or more destinations. It should only be displayed if an arc file’s size is incorrect somewhere.

  • selftest: with -v3, -v4, and/or an arc file on the command line, selftest verifies block hashes. If it sees a block with a bad hash, it tries to get the same block from another destination. If all destinations have the same bad block, selftest deletes the block. However, this delete was not happening, even with --fix, so a subsequent selftest would give the same errors. Now the delete is working properly. As usual, it may take 3-4 more selftests without -v3, -v4 and/or an arc filename to get rid of follow-on errors. Thanks Vincent!

  • selftest: if -v4 --sample is used on a backup where no remote destinations support sampling or destinations are not setup, add a hint to use -v3 --sample to sample just the local arc files.

  • selftest: explain that database integrity errors are usually hardware problems and cannot be bugs in HashBackup. These can sometimes be fixed with selftest --fix, but it’s important to find out what caused them (bad RAM, USB drives, cables, etc)

  • versions: prevent traceback error: TypeError: unsupported operand type(s) for -: 'NoneType' and 'int' if a backup was interrupted.

  • log: if hb log is used without a command, it caused an IndexError. Thanks Arthur!

#2525 Jun 15, 2020

Summary:

  • database upgrade to dbrev 33

  • versions: show backup time and space

  • backup: show percent progress on large files

  • mount: performance increase for large files

  • dest: lowercase dest name on the command line

  • backup: race condition when file removed

  • dest clear/setid: shard fix

  • retain: error if -r is out of range

  • backup: --maxtime restart hang

  • backup: rare retain AssertionError

  • backup: progress stats showing 0

  • backup: broken pipe saving fifos

  • backup: unusual block sizes could cause mount problems

Details:

  • this release does an automatic database upgrade when any HB command is used, fixing datesaved timestamps on hard-linked files that could cause retain to fail with an AssertionError (rare)

  • versions: two new columns, backup time and backup space, have been added. Some versions may have a blank backup space if all blocks have been consolidated into other versions because of packing.

  • backup: when saving large files (>500MB), show a percentage progress line during the save if output is to a screen

  • mount: reading large files saved with the new "auto" block size (#2490 on Mar 16, 2020) caused mount performance problems because of the larger block size.

  • dest: destination names are always lowercase inside HB even if they are mixed case in dest.conf. If uppercase letters were used with the dest setid or clear commands, they gave incorrect errors that a destination did not exist. Thanks Francisco!

  • backup: if backup tries to save a file but it’s been removed, it could cause a traceback. Thanks Bruce!

    Traceback (most recent call last):
      File "backup.py", line 3474, in <module>
      File "backup.py", line 3317, in main
      File "backup.py", line 1392, in backupobj
      File "backup.py", line 339, in markgonelog
    IndexError: No item with that key
  • dest: with a sharded backup, the dest clear and setid commands failed unless --force was used. Now they ask if it’s okay to proceed just once, then execute the command. Thanks Josh!

  • retain: if -r (rev) is out of range, a traceback occurred instead of an error message about -r. From internal testing.

  • backup: sometimes could hang when restarting a backup with --maxtime. Thanks Maciej and Dave!

  • backup: a rare race condition with hard-linked files changing during a backup could cause retain to fail with an AssertionError. This occurred with 2 files in a backup of 16M files. Thanks Daniele!

  • backup: progress stats were showing 0 bytes saved during backup

  • backup: saving a fifo could cause a broken pipe error and the fifo was only partially saved

  • backup: if a file was saved with an odd block size, like -B70000, mount could have trouble reading the file

#2490 Mar 16, 2020

Summary:

  • remove expiration date

  • config: new default block size "auto"

  • backup: show progress stats

  • backup: zero block size (-B0 / -V0) are errors

  • backup: improve exclude scalability

  • get: check local file type before using it

  • get: fix OSX HFS+ restore performance problem

  • export: now works if dest.db doesn’t exist

  • export: -c is not required

  • export: -k encryption caused error message

  • selftest: fix --sample error with -v3

  • dest verify: clarify size error message

  • config: remove obsolete audit-commands keyword

  • retain: remove obsolete --dryrun and --force

Details:

  • though I wasn’t afraid of getting hit by a bus, having an expiration date on HashBackup during a pandemic seems like a really bad idea.

  • config: previously the default block size (config option block-size) was 32K variable. This will not change on existing backups. For newly-created backups, the default is now "auto". This will choose an efficient block size, either variable or fixed, for each file saved. This may reduce dedup somewhat (5% in tests), but greatly reduces block overhead in the main database (from 2.2M blocks to 780K in the same test), for faster selftests and restores. For more information see the block-size config option on the HashBackup web site. To get the old behavior on new backups, use hb config to set block-size to 32K or use -V32K on the backup command line. Use hb config to set block-size to "auto" on existing backups to get the new behavior. This will cause more data to be saved for modified files because of the block size change.

  • backup: if output is being sent to a screen, show files checked and saved every 10K files. This is useful on huge backups with mostly unmodified files, so users know files are being checked.

  • backup: using -V0 or -B0 (zero block sizes) should have failed, now they do

  • backup: excluding many files with inex.conf had some scalability problems. For testing, 200K pathnames were added to inex.conf, half with wildcards, half without. Then a directory of 10K small files was backed up on a Core 2 Duo system:

          Excludes   Backup init      Backup time    Backup rate (files/sec)
    ----------------------------------------------------------------------
    OLD:  default       0 sec              3 sec        >3300
    OLD:    200K       37 sec           2060 sec            5
    NEW:    200K       24 sec              3 sec        >3300 (660x faster)
  • get: by default, get will copy local files if the size and time match the file to be restored. But it also needs to ensure the file type matches: if a file was saved, but at restore time the file is a symlink to a file with matching size and type, the symlink must be deleted and replaced by an actual file. Now hb get checks the file type before using a local file. From internal review.

  • get: a couple of changes this year made restores of large directories containing small files run 5x slower on OSX HFS+ filesystems. This was noticed during internal benchmarking and has been corrected.

  • export: now works if dest.db doesn’t exist (destinations never used)

  • export: fixed an error saying -c was required even when HASHBACKUP_DIR was set correctly. Thanks Maciej!

  • export: if a backup was created or rekeyed with -k to use a specific key, export worked but would give an error "Unable to access HashBackup database: file is not a database" when trying to update export’s exit status in the audit log. Thanks Ralph!

  • selftest: if no destinations were configured, --sample gave an error "No destinations support sampling" even if -v3 was used. Thanks Gabriel!

  • dest verify: there are 2 types of confusing remote file size errors detected with dest verify, and the final status message has been changed to clarify this:

    1. HB uploaded a file, noted the uploaded size, but now dest verify (a remote "list") shows the file is a different size. The dest ls command shows the size HB uploaded, but the remote storage service says it is a different size. This can happen if a remote file is overwritten after the HB upload for example, and is a verify error.

    2. HB knows an arc file should be a certain size, but a different size file was uploaded to the remote service. This is called a db size mismatch and is very unusual. It can happen if a local arc file is overwritten while HB is transmitting it for example.

  • config: as mentioned July 10 2019, #2377, the audit-commands config keyword is now obsolete. Previously this was used to select which commands were audited. All commands are audited as of #2377.

  • retain: as mentioned June 6, 2019, #2347, the --force option is now obsolete and the --dryrun option has been replaced by -n.

All Releases