#2552 July 10, 2021


* new destination type "dirs" (multiple directories)
* config: fix strange interactions
* config: don't ask for admin passphrase twice
* retain/rm: always verify hashes during packing
* retain/rm/selftest: improve bad block messages


- a new variation of the dir destination with type "dirs" allows
  specifying multiple destination directories.  This is useful with
  multi-drive external disk enclosures configured as JBOD (Just a
  Bunch Of Disks).  It is possible to configure these units as one
  large RAID0 drive and use a "dir" destination, but that reduces
  reliability: if any disk fails, all data is typically lost.  A
  RAID > 0 configuration is most reliable, but also reduces capacity.
  A JBOD configuration using the dirs destination with multiple
  directories is a middle ground: if a disk is lost, only the backup
  data on that drive is lost.  This is an especially reasonable
  configuration if you have another copy of the backup data at another

  The dirs destination requires a dirs keyword specifying multiple
  target directory paths, separated by colons.  When reading from the
  destination, HB will check all directories until it finds the file
  it needs.  When writing arc files, HB will check available space in
  each directory and copy to a directory with enough space to hold the
  file.  When copying non-arc files, HB makes copies in every
  directory that has room, to increase the chances of a recover in
  case the local backup directory is lost and one or more JBOD disks
  are also lost.

  A new "copies" option specifies how many copies of arc file to
  create, to increase redundancy.  The default is 1.

  A new "spread" option causes arc files to be distributed over all
  disks rather than filling up disk 1 before using disk 2.  

- config: it is possible to set enable/disable-commands without
  setting an admin-passphrase; only a warning is printed.  But if
  enable-commands was set to 'backup' for example, then it was not
  possible to set an admin-passphrase because the config command was
  disabled.  This created a backup where no config options could ever
  be changed again, which is a little strange and unintended.

  Now, the config command with the admin-passphrase subcommand is
  always allowed.  If you really want a backup that is unchangeable
  like before, you can set an admin-passphrase to some random string
  that you then forget.

- config: the admin passphrase was sometimes requested twice,
  depending on whether the config command was enabled or disabled.

- retain/rm: during packing (removing empty space from an arc file), a
  block sometimes needs to be decrypted & decompressed, and this also
  verifies the block's hash.  If the block is bad, a hash mismatch
  error occurs and retain/rm abort.  Sometimes these steps are not
  necessary and are skipped as a performance optimization.  However,
  if a block is bad, skipping the hash verification allows the bad
  block to propagate without error during packing.  The first time the
  bad block would be noticed would be either a -v4 selftest of the arc
  file or a restore needing the bad block.  With this release, all
  block hashes are verified while packing an arc file and retain/rm
  will abort if any blocks are bad.  This may slow down packing
  somewhat; if it becomes a problem, please send an email.

- retain/rm/selftest: improve advice given when bad blocks are
  detected.  For example, instead of:
    getblock: hash mismatch blockid 3 in arc.0.0
  the message is now:
    getblock: hash mismatch blockid 3; run selftest --fix arc.0.0

#2546 June 2, 2021


* database upgrade to dbrev 34
* backup: fix arc size mismatches
* backup: inex.conf exclude bug
* get: fix restore tracebacks
* get: fix symlink restore on Linux
* dir dest: bug fixes
* selftest: remove extraneous size message
* selftest: really delete blocks with --fix
* selftest: expand notice message about -v4 --sample
* selftest: explain database integrity errors
* versions: fix traceback on interrupted backups
* log: fix index error with hb log


- this release does an automatic database upgrade to remove unused
  columns from the database

- backup: if a destination failed a certain way, it could cause one or
  more remote archives for that version to appear corrupted after
  subsequent backups, and also generate selftest errors:

    Error: arc.V.N size mismatch on XXX destination: db says YYY, is ZZZ

  A selftest -v4 would give hash mismatch errors on the interrupted
  arc files.  If the arc files were stored correctly either locally or
  on another destination, selftest -v4 --fix could fix the errors.
  But sometimes the only correct version was local.  If
  cache-size-limit was also set, the correct arc files could be
  deleted when the cache filled up, leading to permanent corruption.

- backup: a bug in exclude processing caused some files to be backed
  up that should have been excluded

- get: restores could fail with a "list index out of range" error in
  certain circumstances.  Thanks Ian!

- get: could fail with a traceback:
    NameError: global name 'errmsg' is not defined
  when scanning local files to find a match for a file to be restored.
  From internal testing.

- get: fix "Unable to set extended attributes" error when restoring
  symlinks on Linux, related to the order things are restored, if the
  symlink target had extended attributes.  Thanks Ian!

- dir dest: fix hang if dir destination file is shorter than expected
  when using selective download.  From internal testing.

- dir dest: retry selective download on I/O error.  From internal

- selftest: the message "Note: arc.V.N is correct size on ..." was
  being displayed if an arc file had not yet been transferred to one
  or more destinations.  It should only be displayed if an arc file's
  size is incorrect somewhere.

- selftest: with -v3, -v4, and/or an arc file on the command line,
  selftest verifies block hashes.  If it sees a block with a bad hash,
  it tries to get the same block from another destination.  If all
  destinations have the same bad block, selftest deletes the block.
  However, this delete was not happening, even with --fix, so a
  subsequent selftest would give the same errors.  Now the delete is
  working properly.  As usual, it may take 3-4 more selftests without
  -v3, -v4 and/or an arc filename to get rid of follow-on errors.
  Thanks Vincent!

- selftest: if -v4 --sample is used on a backup where no remote
  destinations support sampling or destinations are not setup, add a
  hint to use -v3 --sample to sample just the local arc files.

- selftest: explain that database integrity errors are usually
  hardware problems and cannot be bugs in HashBackup.  These can
  sometimes be fixed with selftest --fix, but it's important to find
  out what caused them (bad RAM, USB drives, cables, etc)

- versions: prevent traceback error:
    TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
  if a backup was interrupted.

- log: if hb log is used without a command, it caused an IndexError.
  Thanks Arthur!

#2525 Jun 15, 2020



* database upgrade to dbrev 33

* versions: show backup time and space

* backup: show percent progress on large files

* mount: performance increase for large files

* dest: lowercase dest name on the command line

* backup: race condition when file removed

* dest clear/setid: shard fix

* retain: error if -r is out of range

* backup: --maxtime restart hang

* backup: rare retain AssertionError

* backup: progress stats showing 0

* backup: broken pipe saving fifos

* backup: unusual block sizes could cause mount problems


- this release does an automatic database upgrade when any HB command

  is used, fixing datesaved timestamps on hard-linked files that could

  cause retain to fail with an AssertionError (rare)

- versions: two new columns, backup time and backup space, have been

  added.  Some versions may have a blank backup space if all blocks

  have been consolidated into other versions because of packing.

- backup: when saving large files (>500MB), show a percentage progress

  line during the save if output is to a screen

- mount: reading large files saved with the new "auto" block size

  (#2490 on Mar 16, 2020) caused mount performance problems because of

  the larger block size.

- dest: destination names are always lowercase inside HB even if they

  are mixed case in dest.conf.  If uppercase letters were used with

  the dest setid or clear commands, they gave incorrect errors that a

  destination did not exist.  Thanks Francisco!

- backup: if backup tries to save a file but it's been removed, it

  could cause a traceback.  Thanks Bruce!

    Traceback (most recent call last):

      File "", line 3474, in <module>

      File "", line 3317, in main

      File "", line 1392, in backupobj

      File "", line 339, in markgonelog

    IndexError: No item with that key

- dest: with a sharded backup, the dest clear and setid commands

  failed unless --force was used.  Now they ask if it's okay to

  proceed just once, then execute the command.  Thanks Josh!

- retain: if -r (rev) is out of range, a traceback occurred instead of

  an error message about -r.  From internal testing.

- backup: sometimes could hang when restarting a backup with

  --maxtime.  Thanks Maciej and Dave!

- backup: a rare race condition with hard-linked files changing during

  a backup could cause retain to fail with an AssertionError.  This

  occurred with 2 files in a backup of 16M files.  Thanks Daniele!

- backup: progress stats were showing 0 bytes saved during backup

- backup: saving a fifo could cause a broken pipe error and the fifo

  was only partially saved

- backup: if a file was saved with an odd block size, like -B70000,

  mount could have trouble reading the file

#2490 Mar 16, 2020



* remove expiration date

* config: new default block size "auto"

* backup: show progress stats

* backup: zero block size (-B0 / -V0) are errors

* backup: improve exclude scalability

* get: check local file type before using it

* get: fix OSX HFS+ restore performance problem

* export: now works if dest.db doesn't exist

* export: -c is not required

* export: -k encryption caused error message

* selftest: fix --sample error with -v3

* dest verify: clarify size error message

* config: remove obsolete audit-commands keyword

* retain: remove obsolete --dryrun and --force


- though I wasn't afraid of getting hit by a bus, having an expiration

  date on HashBackup during a pandemic seems like a really bad idea.

- config: previously the default block size (config option block-size)

  was 32K variable.  This will not change on existing backups.  For

  newly-created backups, the default is now "auto".  This will choose

  an efficient block size, either variable or fixed, for each file

  saved.  This may reduce dedup somewhat (5% in tests), but greatly

  reduces block overhead in the main database (from 2.2M blocks to

  780K in the same test), for faster selftests and restores.  For more

  information see the block-size config option on the HashBackup web

  site.  To get the old behavior on new backups, use hb config to set

  block-size to 32K or use -V32K on the backup command line.  Use hb

  config to set block-size to "auto" on existing backups to get the

  new behavior.  This will cause more data to be saved for modified

  files because of the block size change.

- backup: if output is being sent to a screen, show files checked and

  saved every 10K files.  This is useful on huge backups with mostly

  unmodified files, so users know files are being checked.

- backup: using -V0 or -B0 (zero block sizes) should have failed, now

  they do

- backup: excluding many files with inex.conf had some scalability

  problems.  For testing, 200K pathnames were added to inex.conf, half

  with wildcards, half without.  Then a directory of 10K small files

  was backed up on a Core 2 Duo system:

        Excludes   Backup init      Backup time    Backup rate (files/sec)


  OLD:  default       0 sec              3 sec        >3300

  OLD:    200K       37 sec           2060 sec            5

  NEW:    200K       24 sec              3 sec        >3300 (660x faster)

- get: by default, get will copy local files if the size and time

  match the file to be restored.  But it also needs to ensure the file

  type matches: if a file was saved, but at restore time the file is a

  symlink to a file with matching size and type, the symlink must be

  deleted and replaced by an actual file.  Now hb get checks the file

  type before using a local file.  From internal review.

- get: a couple of changes this year made restores of large

  directories containing small files run 5x slower on OSX HFS+

  filesystems.  This was noticed during internal benchmarking and has

  been corrected.

- export: now works if dest.db doesn't exist (destinations never used)

- export: fixed an error saying -c was required even when

  HASHBACKUP_DIR was set correctly.  Thanks Maciej!

- export: if a backup was created or rekeyed with -k to use a specific

  key, export worked but would give an error "Unable to access

  HashBackup database: file is not a database" when trying to update

  export's exit status in the audit log.  Thanks Ralph!

- selftest: if no destinations were configured, --sample gave an error

  "No destinations support sampling" even if -v3 was used.  Thanks


- dest verify: there are 2 types of confusing remote file size errors

  detected with dest verify, and the final status message has been

  changed to clarify this:

  1. HB uploaded a file, noted the uploaded size, but now dest verify

  (a remote "list") shows the file is a different size.  The dest ls

  command shows the size HB uploaded, but the remote storage service

  says it is a different size.  This can happen if a remote file is

  overwritten after the HB upload for example, and is a verify error.

  2. HB knows an arc file should be a certain size, but a different

  size file was uploaded to the remote service.  This is called a db

  size mismatch and is very unusual.  It can happen if a local arc

  file is overwritten while HB is transmitting it for example.

- config: as mentioned July 10 2019, #2377, the audit-commands config

  keyword is now obsolete.  Previously this was used to select which

  commands were audited.  All commands are audited as of #2377.

- retain: as mentioned June 6, 2019, #2347, the --force option is now

  obsolete and the --dryrun option has been replaced by -n.

#2461 Dec 3, 2019 - expires Apr 15, 2020



* bump expiration date

* selftest: bug fixes with missing arc file

* selftest: --inc download limit only works for -v4

* file permissions fine-tuned

* export: fix traceback

* selftest: delete extraneous .tmp file after -v4 check

* dest verify: add --force option

* dest ls: add -v option

* backup: check for partial arc file write


- selftest: if an arc file is missing and deleted with --fix, selftest

  could fail later with a traceback:

      TypeError: 'NoneType' object is not iterable

  A 2nd selftest showed errors about extra blocks in the dedup table,

  a 3rd selftest ran clean.  This traceback is now fixed and the 2nd

  selftest runs clean.  Thanks Alex!

- selftest: if cache-size-limit is set, the only copy of an arc file

  is missing, and hb verify is run, a subsequent hb selftest -v5

  without --fix would fail with a traceback error in the prefetcher.

  This was happening because -v5 with a limited cache starts fetching

  arc files very early, before it knows that some arc files are

  missing.  To fix this, the prefetch plan is computed just before it

  is needed rather than during initialization.

  Related to this, if a missing arc file is not deleted with selftest,

  the check level is reduced from -v5 to -v2 to avoid the prefetcher

  traceback error and a warning is issued.  Thanks again Alex!

- when HB created new files, it sometimes left the execute (x) bit set

  in the file permissions because this is the default for Python

  programs that use  An example of this is the log files.  In

  other cases where HB did specify the permission, it used 0644, the

  permissions for a typical Unix setup with a default umask of 022.

  An example of this is arc files.  However, if a site sets a umask of

  02, the group id should also have write permission.

  All uses of were reviewed and when files are created, a

  default permission of 0666 is now used.  The bits set in umask

  (usually 022) are then turned off, making the new file's permissions

  0644.  With a umask of 2, new files will have permission 0664, and

  group users will have write access.  This was not possible before

  this change, which affected 12 calls.  Thanks Francisco!

- selftest: --inc (incremental testing) works for -v3 to -v5, but the

  optional download limit currently only works for -v4.  A download

  limit doesn't make sense for -v3 since it doesn't download files.

  For -v5, a download limit is more complex to implement, because a

  single user file might require downloading many arc files.  A note

  has been added to the selftest doc page and selftest will halt if a

  download limit is used with a check level other than -v4.  Thanks Alex!

- export: fixed a traceback that prevented an export from being

  generated.  Thanks Frank!

- selftest: remove arc .tmp files that may be left after a -v4 check.

  Thanks Kuryan!

- dest verify: would not remove a file from HB's database if it was

  the only copy, even if the file didn't actually exist on the remote

  or was the wrong size.  Now with --force, it will remove the file

  from HB's database, allowing selftest --fix to make corrections.

- dest ls: a new -v option shows associated data HB tracks for

  uploaded files.  For example on B2, this shows the SHA1 hash and B2


- backup: halt on partial writes when creating arc files.  Partial

  writes are very unlikely, but possible.  Thanks Frank!

#2447 Oct 31, 2019 - expires Jan 15, 2020



* ssh: better error when dd unavailable

* selftest: truncate hash mismatch files with --fix


- ssh: the ssh destination uses remote "dd" commands to download

  pieces of arc files (selective download).  This is faster than

  downloading whole arc files, especially when restoring small files.

  Some 3rd-party providers of ssh services use jails or chroots to

  restrict the commands that can be used and dd is sometimes not

  available.  Now HB gives advice to either use sftp instead of ssh or

  enable dd.

-- selftest: --fix will now truncate files if a hash mismatch is

   detected with -v5.  This is needed when bad hardware caused errors

   in the backup, the hardware has been fixed, and the user wants a

   clean selftest.

   IMPORTANT NOTE: if HB is used with defective hardware, especially

   non-ECC RAM that is defective, a clean selftest doesn't mean the

   backup is fine.  Some errors may not be detectable, for example, a

   bit flip in a database page in RAM, before being written to disk,

   could cause a file to be restored later with the wrong permissions.

   The best thing to do after a hardware problem is fixed is to start

   a new backup and keep the previous backup for historical reference

   if needed.

#2442 Sep 15, 2019 - expires Jan 15, 2020



* bump expiration date

* dir destination: remove debug output

* dest.conf: maxsize default is now unlimited

* mount: bug fix


- dir destination: remove "debug: fast copy" debug output when copying

  files to a dir destination with large iosize for NFS4.

- dest.conf: the maxsize keyword is the maximum size of files uploaded

  to a destination.  Any files over maxsize are split into parts

  before uploading, and reconstructed from parts when downloading.

  maxsize is used for small backups to email or WebDAV where there are

  often low limits on file sizes, like 25MB.

  Previously the maxsize default was 5GB, because B2 has a hard limit

  of 5GB for uploads unless special procedures are used, and HB

  doesn't use them.  However, some sites increase arc-size-limit to

  create large arc sizes, like 10GB.  If maxsize is not also changed,

  it causes inefficient splitting of arc files.  This is often not

  recognized until the backup is well underway or finished, and it's

  hard to correct split arc files once created.

  The new default for maxsize is unlimited, ie, files are never split.

  The B2 driver now raises an error when trying to upload files > 5GB.

- mount: fix "error getting block xxx logid xxx ix 0: -xxxxxxxxxx"

  Thanks Warz!

#2438 Aug 29, 2019 - expires Oct 15, 2019



* recover: avoid HMAC warnings with --check, improve performance

* log: avoid "Database is locked" on Ctrl-C

* audit: avoid double audit entries when logging

* fix bug generating hb.db.N files


- recover: a couple of minor changes to hb.db.N files prevent

  (harmless) HMAC warnings during recover --check and speed it up.

- if hb log selftest was interrupted with Ctrl-C during "Checking

  database integrity" with more than 5 seconds to go, it would cause a

  "Database is locked" error.  A key aspect of this is that Ctrl-C is

  deferred during the database check, causing a long delay before the

  Ctrl-C is acknowleged by selftest.  Thanks Chris!

- audit: with "hb log backup ...", two audit entries were created

  instead of one.

- a bug generating hb.db.N files caused these files to flip between

  small and large sizes, when they should usually be on the small

  side.  It also could cause recover to fail, though recover --check

  worked.  This bug has been around for about 2 years and was found

  via internal testing.

#2428 Aug 19, 2019 - expires Oct 15, 2019



* recover: fix --check bug


- recover: if --check was used and there were 2 destinations in

  dest.conf, hb.db.N files were downloaded and applied twice by

  mistake.  This could also cause download problems if workers was > 1

  in dest.conf because the same hb.db.N file could be downloaded

  concurrently by multiple workers.

#2423 Aug 18, 2019 - expires Oct 15, 2019



* config: new option db-history-days

* rekey: rollback is automatic

* selftest: --fix corrects dedup table errors

* selftest: fix -v5 bug with dedup and sparse files

* recover: continue after incremental restore errors

* recover: suggest help after partial rekey upload

* rekey: commit locally then upload

* ls: sparse file sha1, isn't


- config: a new option, db-history-days, keeps incremental database

  backup files (hb.db.N files) for the specified number of days before

  the latest backup.  Combined with recover --check, this allows

  recovering earlier versions of the main backup database.  The

  default is 3.  HashBackup previously operated with 0.  The new

  minimum value is 1 so there is always at least one older version of

  the database stored on remotes.

  If the backup database is stored on networked or removable storage

  (not recommended!), or if using non-ECC RAM, lower values are not

  recommended because database damage is more of a possibility.  Also,

  db-check-integrity should be set to "upload" in these configurations.

  For remote storage such as Wasabi with delete penalties (90 days),

  setting db-history-days to 90 may lower total storage costs.  Byte

  storage costs will actually increase, but delete penalty costs

  should decrease more.  This is not a problem on Amazon S3 because

  HashBackup uses the regular storage class with no delete penalties

  for storing database increments, but Wasabi does not have storage

  without delete penalties.  Also set pack-age-days to 90 with Wasabi.

  Setting db-history-days higher may also have the effect of making

  each hb.db.N file smaller, though total storage for all hb.db.N

  files will be higher because more increments are kept.  Higher

  values might be useful on slow upload links to decrease the size of

  hb.db.N files uploaded after each backup.

- rekey: if rekey aborted, rekey had to be run again to do a rollback.

  Now the rollback happens automagically, either after the attempted

  rekey or on the next HB command.

- selftest: for technical reasons, the dedup table was checked if

  --fix was not used but was not checked or corrected if --fix was used.

  Issues with the dedup table are not critical and will not harm the

  backup because of other checks before deduping a block.  If selftest

  did notice problems, the dedup table had to be manually deleted.  Now

  selftest also checks the dedup table with --fix and if any problems

  occur the dedup table is deleted and will be rebuilt on the next


- selftest: when sparse files were backed up with dedup enabled and

  they contained sequential identical blocks with a hole between them,

  selftest -v5 would say there was a file hash mismatch.  The file

  restored correctly.  This was actually a selftest bug and the backup

  was fine.  Thanks Thomas!

- recover: previously, if an error occurred while applying an

  incremental database backup file (hb.db.N) to re-create the main

  database (hb.db), recover aborted.  With the new db-history-days

  option, HashBackup is retaining older increments longer.  If a rekey

  occurs during the db-history-days time window, older increments

  could be using an obsolete key.  A normal recover works fine in this

  situation, but a recover with the new --check option causes HMAC

  errors when the older increments are applied because the older

  increments were created with the previous key.conf file, before the

  rekey.  Now the error is shown, the older increment is ignored, and

  HB continues restoring the database.

- recover/rekey: following a rekey, new database increments (hb.db.N)

  and dest.db are uploaded to remotes, keyed with the new key.  If the

  upload fails for some reason, the local backup uses the new key.conf

  but the remote backup files still use the old key, saved in

  key.conf.orig.  This is fine and the upload will be retried on the

  next backup or sync.  But if recover is attempted before the upload

  completes, there may be confusion about which key file to use.

  Recover now tries to be helpful in this odd situation by suggesting

  to use key.conf.orig if key.conf doesn't work.

- rekey: previously, rekey did not commit (completely finish) until

  after all files were uploaded to remotes.  This can be a problem if

  some remotes finish and others don't, because it's harder to

  rollback a remote.  Now rekey commits locally first, then uploads

  files to remotes.  The next backup or sync will finish any

  incomplete uploads.

- ls: a sha1 hash is shown for every file with -lv.  For sparse files,

  this is not a true sha1 and is unique to HashBackup.  A prefix like

  7: or 11: is now displayed as a hint that these sha1's, aren't.

#2406 Aug 10, 2019 - expires Oct 15, 2019



* OpenSwift and Rackspace Cloud Files no longer supported

* ssh: handle missing file better

* dest verify: easier to use in scripts & cron jobs

* sync: don't stop on missing arc file

* selftest: show all database integrity errors

* selftest: --fix rebuilds damaged database indexes

* recover: fix halt with missing arc file

* recover: new --check option

* config: new option db-check-integrity

* rekey: rollback could get stuck if rekey was aborted

* rekey: fix setting a new passphrase over an old passphrase

* rekey: avoid rollback when -p passwords don't match

* rekey: don't ask for passphrase before doing rollback

* rekey: blank passphrase with -p ask removes passphrase

* export: fix two minor bugs with shards + temporarily disable


- as mentioned a month ago, the Rackspace Cloud Files native driver

  has been throwing errors lately, no one has complained, and Cloud

  Files object storage service is 4x more expensive than S3, so

  support for it has been dropped.  It should still be accessible

  using an rclone destination if necessary.

- the ssh / sftp destination checked error message text to decide

  whether a file was missing on a remote.  This is not reliable on

  non-English systems so a better method is now used.

- dest verify: previously, a lot of success / fail messages were

  displayed for dest verify, the messages weren't summarized, and the

  exit code was always zero.  This made it hard to use dest verify

  with very large backups and in scripts and cron jobs.  Now only

  error messages are displayed, with a summary of successful and

  failed file counts at the end.  The exit code is non-zero on any

  verify failures, even if the errors are corrected.

- when syncing, HB would halt with an error if an arc file was not

  found.  Now it displays the error and continues syncing.

- selftest: instead of showing the first database integrity error and

  stopping, selftest now shows all errors to allow better assessment

  of the type and extent of damage to a database.  A customer had a

  damaged database that was easily fixed, but it was initially "scary"

  without knowing all of the specific integrity errors.  Thanks Chris!

- selftest: --fix now rebuilds damaged database indexes.  This fixed

  the previous customer's "malformed" database.  Thanks Chris!

- recover: if recover tried to download an arc file but it was missing

  on the destination, it halted with an error message.  Now it

  continues to download the remaining arc files.

- recover: a new --check option will recover the database in a slower

  way that does an integrity check after each hb.db.N is applied and

  saves the latest hb.db possible.  This is useful if the local

  database has become damaged, to recover an older version of hb.db

  that is not damaged.  Thanks Chris!

- config: a new option db-check-integrity controls when database

  integrity is checked.  The default is "selftest".  This is how

  HashBackup has always operated and is fine when backups are run and

  stored on enterprise-class hardware: non-removable drives, ECC RAM,

  and hardwired networks.

  Setting db-check-integrity to "upload" will do an integrity check

  before new hb.db.N files are generated and uploaded to remotes.

  This is recommended for consumer-level hardware using wireless

  networks, removable USB drives, or non-ECC memory, to prevent a

  possibly damaged local database from being transferred to remote

  destinations.  Thanks Chris!

- rekey: in the previous release, auditing was enabled for all

  commands.  If hb.db is set to read-only, auditing cannot happen and

  an error about a read-only database occurs.  If the command was

  rekey and ctrl-c is used to abort it, the next command will say

  "Incomplete rekey, run rekey".  This usually does a rollback,

  putting things back in the state before rekey, but instead, HB said

  "run rekey" again.  Now is does the rollback as expected.  No data

  was lost with this, it's just very confusing.  Thanks Chris!

- rekey: if a key had a passphrase and rekey -p was used to change it,

  it caused a "file is not a database" error and the rekey didn't

  occur.  The backup was fine.  Thanks Chris!

- rekey: previously if rekey -p aborted, say because the two new

  passwords were different, a backup & rollback was needed even though

  nothing had changed.  Now a backup + rollback is only needed if

  rekey actually starts and doesn't finish.

- rekey: if rekey doesn't complete for whatever reason, the next rekey

  restores the backup files (rollback).  If the key had a passphrase,

  rekey would ask for the passphrase before doing the rollback.

  That's confusing and the passphrase isn't necessary to move original

  files back into place, so now it doesn't ask for a passphrase.

- rekey: previously when a blank passphrase was entered with -p ask in

  either init or rekey, an empty passphrase was placed on the key and

  HB would still prompt for a passphrase.  Now if a blank passphrase

  is entered with init or rekey, the passphrase is removed completely,

  as if -p ask was not used.  Existing keys with a blank passphrase

  will still prompt for it as before.  Thanks Chris!

- export: with shards enabled, a traceback could occur during export.

  The tar output filename was displayed incorrectly.  Export is only

  applying the passphrase to shard #1 and is temporarily disabled for

  sharded backups.

#2377 Jul 10, 2019 - expires Oct 15, 2019



* config: all commands are audited

* config: hfs-compress option removed

* ssh: use integers for scp rate option (-l)

* ssh: optimize selective download

* backup: fix stall on 32-bit systems

* retire 32-bit FreeBSD build

* mount: fix bug with limited cache & large files


- config: all commands are now audited.  The audit-commands config

  option will be retired in 2020 so scripts that set this should be

  adjusted before the end of 2019.

- config: the hfs-compress option (OSX) is being retired.  It was a

  clumsy implemention using the "ditto" command, and everything runs

  fine if previously-compressed files are restored uncompressed.

  Scripts that set this should be adjusted before the end of 2019.

- ssh: if the rate keyword was used with an ssh destination, the -l

  option had a floating point value.  Some versions of scp don't like

  that, so now it's an integer.  Thanks Warz!

- ssh: selective download always used a dd block size of 1K, which is

  inefficient for large transfers.  Now it uses a range of block sizes

  up to 64K to optimize large downloads.

- backup: 32-bit systems with cache-size-limit -1 (all arc files are

  local - the default) stalled when the backup size was near 2GB.

- there have been no downloads of the 32-bit FreeBSD build of

  HashBackup in the last 12 months so it has been removed.

- mount: fixed intermittant error with cache-size-limit set:

    mount: error getting block 39608 logid 949 ix 2003: free variable

           'arcsize' referenced before assignment in enclosing scope

  This could occur when reading a user file through the mount that

  crossed an arc file boundary in the backup, so it was more likely to

  occur on large files.  Thanks Krisztian!

#2367 Jun 25, 2019 - expires Oct 15, 2019



* Rackspace Cloud Files native support ending 

* selftest: test blocks listed on command line

* retain: show overall retention period

* retain: fix divide by zero if path not found

* retain: new -r option

* retain/rm: pack more often

* cacerts.crt updated

* upgrade: add retry loop

* retain: keep old versions longer


- HashBackup has had a native driver for Rackspace Cloud Files since

  2011.  Recently it started throwing SSL Certificate Verification

  errors.  No one has complained about this, so apparently there are

  very few if any HB customers still using Cloud Files (it's 4x more

  expensive than S3).  This driver also supports OpenSwift in theory.

  If anyone is using Cloud Files or OpenSwift with HashBackup, please

  send in an email.  Otherwise, support for this driver will be

  removed in the next release.  It should be possible to access these

  using rclone if necessary.

- selftest detected a block hash mismatch in a customer's dedup table,

  likely caused by a memory parity error.  The dedup table is easily

  corrected by deleting it (hash.db), and backup will re-create it on

  the next backup.  As a double check, blockids (integers) can now be

  listed on the selftest command line.  HB will find the corresponding

  arc file and test all blocks in that arc file with a -v4 test.

- retain: shows the overall retention period, ie, the sum of all the

  -s intervals after adding 1 if retain-extra-versions is set.  Since

  retain intervals follow one after another, the overall retention

  time can be longer than expected, especially considering


- retain: if a path was listed on the command line but not found in

  the backup, the final statistics line caused a divide by zero when

  figuring out percentages deleted and kept (all zeroes)

- retain: a new -r option performs retention only on files at or below

  a specified backup version.  This is for testing retain, but might

  be useful to thin out older files using a different retention policy

- retain/rm: packing runs more often (pack-age-days / 4) but still

  respects the pack-xxx config option limits.

- cacerts.crt, the SSL CA bundle used by HashBackup, was updated.  HB

  will update the cacerts.crt file in every backup directory on the

  next backup if it is writable.

- upgrade: a 2-minute retry loop was added to handle upgrade server


- retain: if a file's backup was 5 years old, -s3y was used, and a new

  copy was saved, retain would delete the 5-year-old copy.  But it

  should have been kept so that restores as of 2 years ago would still

  have the file.  From internal testing.

#2347 Jun 6, 2019 - expires Oct 15, 2019



* bump expiration date

* upgrade: private upgrade servers

* upgrade: -p gets preview release

* dir destination: new iosize keyword to speed up NFSv4

* retain: overhaul -s file retention

* retain: display progress

* retain: new -v1 and -v2 options show more details

* config: new retain-extra-versions config option

* retain: more flexible -x handling of deleted files

* retain: --dryrun displayed incorrect stats

* retain: add -n to replace --dryrun

* retain: --force is being retired

* dest: show traceback if destination fails to start

* selftest: fix --sample bug with list of arcs

* selftest: halt if --sample requested with older arc formats

* selftest: partial selftest if arcs or pathnames requested

* selftest: fix --fix bug with incorrect reference counts

* selftest: fix --fix bug with missing arc file

* dir + ssh: not cleaning up remote temps on abort

* rclone: use deletefile instead of delete (v1.42 and up)

* upgrade: avoid repeated failed upgrades from cron jobs

* backup: only display max dedup usage in backup


- upgrade: accepts an optional http URL to a private HB upgrade

  server.  Sites with more than 10 copies of HashBackup are encouraged

  to set up their own upgrade server.

  IMPORTANT: only hb binaries at this release or later can upgrade

  from a private upgrade server.  So first upgrade to this release, 

  then create and start upgrading from your private server.

  To create an HB upgrade server, run a cron job on your private

  server - no more than once a day please - to mirror the HashBackup

  release with rsync.  Adjust the target for your http server.  The

  source is an rsync module, so two colons are used:

    rsync -a --delete-after --port 8010 /var/www/hbrelease

  To mirror the preview release, use ::preview instead of ::release.


  To upgrade using your local server:

    hb upgrade [--force] http[s]://

  This functions just like the official HB upgrade server, including

  using RSA4096 signature verification to ensure the binary is

  authentic.  To archive past releases omit --delete-after.

- upgrade: the new -p option upgrades to the preview release instead

  of the official release.  The preview release is to try out the

  upcoming release and provide early feedback with non-production

  backups.  -p cannot be used with an upgrade http address, but you

  can create both preview and release private server directories and

  upgrade from either using the http address.

- dir destination: the new iosize dest.conf keyword specifies the size

  of reads and writes when copying files to the destination.  The

  default iosize was 64K bytes, which can be slow if the target is an

  NFSv4 server where each write waits for physical disk I/O to

  complete on the server.  The new default size is 512K bytes and this

  can be increased using iosize.  For example, iosize 16M would use 16

  MB transfers.  Thanks Song for your extensive NFS testing!

- retain: -s has been rewritten in this release to fix several bugs

  found with internal testing.  Performance and memory usage are the

  same as before.  Customers have not reported these long-standing

  problems, but since files are modified and backed up at random

  times, retention bugs may be hard to notice.

  1. sometimes multiple versions of files outside the entire retention

     period but still "live" in the filesystem were being retained,

     when only the latest version should have been retained.  For

     older backups, the first run of retain with this release may

     delete a lot of old files.

  2. retain didn't do time rounding: if a file was saved one day at

     2:10 AM and then was saved the next day at 2:05 AM (not a full

     day apart), retain with -s5d (keep last 5 days) might incorrectly

     delete a backup.

  3. simulating 1200 daily backups from 2016-05-25 to 2019-09-06,

     retain with a schedule of -s 7d4w3m4q2y worked okay (but not

     great) when it was run once after all 1200 backups.

     (Columns are: backup number, date/time saved, and a note to show

     what each backup represents, even though the dates are not always

     as expected):

1200 2019-09-06 07:35:06 1d

1199 2019-09-05 07:51:03 2d

1198 2019-09-04 07:35:12 3d

1197 2019-09-03 07:28:11 4d

1196 2019-09-02 07:33:29 5d

1195 2019-09-01 07:33:54 6d

1194 2019-08-31 07:29:08 7d

1193 2019-08-30 07:40:04 1w

1186 2019-08-23 07:26:14 2w

1179 2019-08-16 07:18:04 3w

1172 2019-08-09 06:59:46 4w

1171 2019-08-08 07:04:31 1m

1143 2019-07-11 07:10:54 2m

1116 2019-06-14 06:35:25 3m

1115 2019-06-13 06:51:05 1q

1033 2019-03-23 09:10:48 2q

  952 2019-01-01 08:05:46 3q

  871 2018-10-12 06:06:38 4q

  870 2018-10-11 06:23:07 1y

  541 2017-11-16 01:16:02 2y

  540 2017-11-15 01:07:44 ??

     But when a file changed every day, running retain after each

     simulated backup only kept 8 days of history:

1200 2019-09-06 07:35:06 1d

1199 2019-09-05 07:51:03 2d

1198 2019-09-04 07:35:12 3d

1197 2019-09-03 07:28:11 4d

1196 2019-09-02 07:33:29 5d

1195 2019-09-01 07:33:54 6d

1194 2019-08-31 07:29:08 7d

1193 2019-08-30 07:40:04 ??

     On a live server backed up daily with -s30d12m it did better, but

     the older backup history is mostly missing and there are obvious

     gaps in the most recent 30 days (only 16 versions):

2821 2019-05-10 02:02:59 bsd732-s001.vmdk 1d

2820 2019-05-09 02:02:18 bsd732-s001.vmdk 2d

2815 2019-05-04 02:03:01 bsd732-s001.vmdk 7d

2813 2019-05-02 02:03:02 bsd732-s001.vmdk 9d

2812 2019-05-01 02:02:21 bsd732-s001.vmdk 10d

2810 2019-04-29 02:02:57 bsd732-s001.vmdk 12d

2808 2019-04-27 02:02:58 bsd732-s001.vmdk 14d

2807 2019-04-26 02:02:43 bsd732-s001.vmdk 15d

2803 2019-04-22 02:03:00 bsd732-s001.vmdk 19d

2802 2019-04-21 02:02:23 bsd732-s001.vmdk 20d

2800 2019-04-19 02:02:25 bsd732-s001.vmdk 22d

2798 2019-04-17 02:02:38 bsd732-s001.vmdk 24d

2794 2019-04-15 02:02:16 bsd732-s001.vmdk 26d

2792 2019-04-14 02:02:03 bsd732-s001.vmdk 27d

2788 2019-04-12 02:03:06 bsd732-s001.vmdk 29d

2787 2019-04-10 02:05:36 bsd732-s001.vmdk 31d

2780 2019-04-02 02:02:08 bsd732-s001.vmdk 2m

2479 2018-04-14 02:02:37 bsd732-s001.vmdk 13m

     Now retain does much better at keeping the requested history.

     Here is the earlier simulation with the new retain:

1200 2019-09-06 07:35:06 1d

1199 2019-09-05 07:51:03 2d

1198 2019-09-04 07:35:12 3d

1197 2019-09-03 07:28:11 4d

1196 2019-09-02 07:33:29 5d

1195 2019-09-01 07:33:54 6d

1194 2019-08-31 07:29:08 7d

1192 2019-08-29 07:27:01 1w

1185 2019-08-22 07:15:41 2w

1178 2019-08-15 07:12:49 3w

1171 2019-08-08 07:04:31 4w

1136 2019-07-04 06:51:57 1m

1108 2019-06-06 06:22:07 2m

1080 2019-05-09 06:25:52 3m

1045 2019-04-04 08:46:36 1q

  954 2019-01-03 07:54:18 2q

  863 2018-10-04 06:35:57 3q

  772 2018-07-05 03:58:38 4q

  590 2018-01-04 02:39:33 1y

  226 2017-01-05 00:08:56 2y

  Backup file retension is a thorny problem and reasonable people will

  disagree on exactly how it should work.  Retain -s uses time

  intervals without regard to midnight, time zones, or calendars, so

  the files it retains may appear arbitrary and confusing.  It retains

  a reasonable backup history over a long period of time (or it should

  now) but retention is not based on days of the week, month, or year.

- retain: display progress (total, kept, and deleted files) every 100K

  kept files or 10K deleted files if sending output to a screen

- retain: previously the -v option would show a brief reason why files

  were deleted.  This is still supported, but is replaced by -v1 to

  show deleted files and -v2 to show both deleted and kept files, with

  more details about why files are deleted or kept.  Directories are

  now supressed since they are lengthy and mostly uninteresting.

- config: a new config option, retain-extra-versions, controls whether

  retain adds an extra period to each -s interval.  The default is

  True to add a safety margin for backup file retention.

  There are 2 mental models for file retention.  Using -s1m can mean:

  1. "keep 1 backup from the last 30 days"

  2. "restore files to their state 30 days ago"

  The difference is subtle, but to accomplish #2, there must be a

  backup older than 30 days. The retain-extra-versions config option

  changes -s7d3m to -s8d4m to satisfy this.  -s1m is changed to -s2m

  so there are 30-60 days of backup history, or "at LEAST 30 days".

  For model #1, set retain-extra-versions to False, but keep in mind

  that with -s1m for example, you could have a current backup from

  today and a "monthly" backup from yesterday, with no earlier backup

  history.  The monthly backup will age every day, from 1 to 30 days,

  depending on the day of the month of the latest backup, then will

  jump ahead 30 days and be next to the latest backup.  With

  retain-extra-versions set to False, -s1m would keep 1 to 30 days of

  backup history, or "at MOST 30 days".

- retain: there are now 3 ways to handle files that are in the backup

  but have been deleted from the live filesystem:

  1. files deleted more recently than -x are treated like a live file:

     the most recent copy is always saved and older copies are retained

     according to -s, -t, and -m.

  2. files deleted longer ago than -x are removed entirely

  3. if -x is not used, it is determined by adding all of the -s and

     -t intervals together.  If neither -s nor -t are used (only -m),

     then deleted files are kept in the backup forever.

  Another difference is that -x can now be larger than -s and -t.  For

  example, -s30d without -x retains deleted files for 30 days.  Adding

  -x1y keeps the latest backup of deleted files for a year.

- retain: the # items kept/deleted was wrong when --dryrun was used,

  and it was especially noticeable if there were a lot of directories

  that a normal retain would have deleted.  Also added -n as a synonym

  for --dryrun.

- retain: the --dryrun option is being renamed to -n for consistency.

  This is unlikely to be in scripts, but should be changed soon if it


- retain: the --force option is being retired in 2020 and should be

  removed from scripts & cron jobs.  If the previous backup aborts,

  retain will operate correctly without --force.

- dest: previously if a destination was unable to start, advice was

  given to use debug 99 to show a traceback.  Now a traceback is

  always shown on a start failure so debug 99 isn't needed.

- selftest: if --sample was used with a list of arc filenames it would

  incorrectly stop with an error message "Use --sample with -v3, -v4,

  or a list of arc filenames to sample".

- selftest: if --sample is used with an older format arc file listed

  on the command line (pre-2013), it can't sample the arc file because

  older arc formats don't support sampling.  Now this causes an error

  to make it clear why sampling isn't happening.

- selftest: if an arc or pathname is on the selftest command line it

  performs an extensive test of the items listed.  Previously this

  still performed a full -v2 selftest.  Now when an arc or pathname is

  listed, some parts of selftest -v2 are skipped to speed up testing

  the items requested, and a partial test note is displayed.

- selftest: a customer forcing a VM crash during a CIFS backup with

  the backup database on mergerfs (FUSE) had incorrect reference

  counts in the database.  Selftest is designed to fix incorrect

  reference counts, but there were a couple of cases that it didn't

  handle right.  Now it fixes bad reference counts without removing

  files.  Thanks Ben!

- selftest: if an arc file is missing, selftest could say:

      Error: block 1 requires arc.0.0

      Deleted block 1

  But it didn't actually delete the block, so the error persisted.

  Now it deletes the block.

- dir + ssh destination: if a backup aborts, it leaves .tmp files on

  the remotes for dir and ssh destinations.  These are supposed to get

  removed on the next backup, but that wasn't happening.  It's fixed

  now, but you may want to manually remove stray .tmp files from these

  destinations (ensure hb is not running).

- rclone: with rclone 1.42 and higher, "deletefile" avoids directory

  listings and is much faster than "delete".  With lower versions of

  rclone, "delete" doesn't do directory listings.  If you run rclone

  1.42 or higher, change "delete" to "deletefile" or get a new copy of from the HashBackup web site (bottom of Destinations).

  Thanks Ben!

- upgrade: if an error occurs downloading any file, display more

  detailed error infomation.  Also, upgrade now opens a temp file next

  to the current HB binary before doing any downloads to avoid the

  situation where the download works but the HB binary can't be

  upgraded.  If the upgrade is in a cron job and fails, maybe because

  of a permission problem, it can cause a download loop.

- backup: the dedup table grows dynamically up to a limit set with the

  dedup-mem config option or the -D backup command line option.  When

  dedup is enabled, backup displays a percentage of how full the dedup

  table is at both its current and max size.  But other commands using

  the dedup table may not know its maximum size and displayed a

  confusing percentage.  Now only backup displays both percentages.

#2302 May 11, 2019 - expires Jul 15, 2019



* get: enable exception handling


- get: exception handling was mistakenly left disabled for development

  and is now enabled again.  When an exception occurs during restores,

  get is supposed to show an error message for the file affected and

  continue the restore, but instead it was halting.  Thanks Ian!

#2301 May 10, 2019 - expires Jul 15, 2019



* get: fix divide by zero


- get: when restoring files over 500M, get tries to report progress

  every 1%.  For block sizes > 4M (a recent feature), this could cause

  a divide by zero exception.  Thanks Ian!

#2298 May 8, 2019 - expires Jul 15, 2019



* upgrade: RSA key rotation

* boot binaries for automated deployment

* mount: cache size increased


- upgrade: the RSA key used to verify new versions of HashBackup has

  been upgraded to RSA 4096.  For the rest of 2019, #2295 and below

  can still use the upgrade command and will use the old RSA key to

  verify the download, while #2298 and up will use the new RSA key.

- previously, new releases of HB were posted to the

  Download page and also to the upgrade server.  To make automated

  deployments easier, an installer or "boot binary" that does not

  change from release to release is now posted on the Download page.

  The boot binary is run just like the regular HB command, but the

  initial run will do an hb upgrade, replacing the boot binary, then

  execute the original command.  The boot binary has the public key

  built in and does RSA 4096 verification of the latest version

  downloaded from the upgrade server, just like a regular hb upgrade.

  To verify the sha1 of the boot binary from a secure server, use:

- mount: the inode cache was increased from 10K to 100K entries to

  handle large directories better.  When this cache is full, mount

  uses around 275M of RAM.  The size of this cache should probably be

  configurable with a mount command line option.

#2295 May 2, 2019 - expires Jul 15, 2019



* count: display shard counts with -n

* export: fix/speed up log exports

* mount: fix bug, improve -f performance

* s3: multipart upload improvements


- count: with -n, count doesn't read file sizes to make the scan

  faster.  But if --shards is used, count should still show how

  files are divided between shards - just not the sizes.

- export: fix and speed up log file pathname mapping and try to map

  deleted pathnames to ??.  Some deleted pathnames might stay visible

  in exported log files because they are hard to distinguish from

  regular text.  Thanks Ben!

- mount: in #2062 (released as #2077), a bug was fixed where deleted

  files could be accessed even though they did not appear in a

  directory listing.  This fix had problems when -f was used: an ls

  inside a mount was sometimes missing files or displaying files

  deleted in an earlier version.  This is fixed, and mount -f can now

  read large directories 9x faster (100K entry directory with 30 full

  versions).  Thanks Dan!

- s3: if more than one s3-like destination was configured and they

  were using multipart uploads, uploads in one destination would block

  uploads in another destination because a lock inside HB was

  configured incorrectly.  Now multiple S3 destinations upload in

  parallel, as expected.

- s3: if multipart uploads are enabled, HB tries to abort previous MP

  uploads during initialization.  It was not using prefixes to list

  previous MP uploads, so if restricted S3 permissions were used,

  access had to be given to the entire bucket for the "list multipart

  uploads" S3 request.  Now this permission can be limited to the

  prefix (dir) being used.  Thanks Alexander!

#2282 Apr 25, 2019 - expires Jul 15, 2019



* backup: fix big performance bug with tiny files

* export: improve privacy of exported data & include logs

* shard partitioning method changed yet again

* count: don't count directories in size stats

* rm: fix rare interrupted packing bug

* handle missing arc files better with dir destination


- backup: a large directory of tiny files was taking 3m30s to backup

  and using 950MB of memory because of a bad interaction with the

  memory manager.  Using a very large block size made it worse.  Now

  the same backup takes 34 seconds and uses 82MB of memory.

- export: changed to improve privacy of exported databases:

  1. if an admin passphrase is set, it is verified before export

  2. all pathnames are mapped to numbers.  An ls of an exported database

     looks like this:

     HashBackup build #2262 Copyright 2009-2019 HashBackup, LLC

     Backup directory: /hb/export


       Loading /hb/export/hb.db.0

       Verified signature

     Verified hb.db signature

     Most recent backup version: 0

     Showing most recent version, use -ad for all

     /  (parent, partial)

     /1  (parent, partial)

     /1/2  (parent, partial)







  3. userid and groupid names are changed to numbers

  4. free and partial pages are removed from the exported database

     since they may contain remnants of removed or changed pathnames

  5. all log data is included with the export, with pathnames changed

     to numbers

  6. the admin passphrase is removed on the exported database.  It is

     stored as a salted hash, and on the customer support side, always

     had to be removed to view the backup config.

- sharding: the partitioning method changed yet again, to divide files

  more evenly into shards.  The goal is have close to the same number

  of files in each shard; files can't be spread evenly by size because

  file sizes constantly change.  Partitioning by size would cause

  files to move between shards, making incremental backups impossible.

- count: for statistic purposes, directories are now counted as a

  zero-size object.  Several other minor changes.

- rm: if a packing operation was interrupted at exactly the right time

  with either cache-size-limit set to -1 or an arc file that wasn't on

  any destination yet, the backup data in an arc file could be lost.

  It's not clear this ever actually occurred with a customer backup,

  but it was verified with a forced test case.  Thanks Israel!

- the "dir" destination type did not handle missing arc files well

  when selective download was used.  It went through the retry cycle

  then stopped, which caused selftest to think that all subsequent arc

  files in the backup were bad, which caused it to delete them all if

  --fix was used.  Now when an arc file is missing on a dir

  destination, --fix only deletes the blocks in that arc file.  Thanks


#2262 Apr 19, 2019 - expires Jul 15, 2019



* ls: add a note about using -ad

* backup: fix trailing slash not backing up with -X

* selftest: fix recent traceback on error


- ls: when looking for a file, the default is to check only the

  current version.  For deleted files, nothing is displayed and it

  looks like the file is not present.  The file is present, because

  otherwise ls says "Not found in backup", but that's a subtle

  difference.  Now a note is added to use -ad to check all versions;

  then deleted files are displayed.

- backup: when a trailing slash was used on a command line pathname,

  backup did not always descend into the directory when -X was used.

  This is a recent bug caused by the changes to add -F, read pathnames

  from a file.  Thanks Dan!

- selftest: a recent change to make selftest work on read-only

  databases had the unintended side-effect of causing an

  UnboundVariable traceback if any error occurred in selftest.

  Thanks Ben!

#2258 Apr 18, 2019 - expires Jul 15, 2019



* new count command

* rm: not always removing packed arc files

* retain: allow both -t and -s

* retain: don't require -x

* retain: fix -v backup date on empty directories

* retain: -x10000y overflow fixed

* dir dest: throttled upload rate not always honored

* detect and explain how to upgrade old HB databases


- a new command, count, traverses directory trees and displays various


  * file counts and sizes by type (file, dir, symlink, etc)

  * top N files and directories with -tN

  * shard file distribution with --shards

  * files scanned per second (use -n for fastest rate)

  The count command may be useful for setting backup options and

  estimating the minimum time an incremental backup will require.

  There are other Unix tools that do similar things but count's real

  purpose is to serve as a proving ground for parallel directory

  scanning and other future optimizations to improve backup speed.

- rm: in some situations, rm would pack arc files but instead of

  removing the original unpacked arc files from remote storage, it

  only flagged them for removal and they would be deleted in the next

  backup.  Now the remote arc files are removed at the end of rm.

- retain: previously either -t or -s could be used, but not both.

  Now they can be used together.  -t retention occurs first, then

  files are retained according to -s.  So for example:

     -t30d -s7d4w12m

  means to keep all files backed up in the last 30 days, then keep 7

  daily backups after that, 4 weekly, and 12 monthly.  -t has the

  effect of shifting or delaying the schedule, so in this case without

  -x, the deleted file time would be 13 months: 12 months from the -s

  schedule then delayed another 30 days by -t.  Thanks Leo!

- retain: previously -x was required if neither -t nor -s were used.

  But this prevented using -m by itself to limit the number of copies.

  Now -x is not required, and the default is to keep all deleted

  files.  Thanks Alexander!

- retain: with -v, retain was not showing the correct backup date when

  deleting empty directories.

- retain: -x10000y halted with "-x must be <= -t" because of an

  overflow problem.  Thanks Alexander!

- with a Dir destination type, the dest.conf rate limit was not

  honored for files smaller than the rate.  For example, if the rate

  was 10M, for 10MB/s, files less than 10MB were copied at full speed

  because of a rounding error.

- this version of HB will upgrade databases dbrev 23 or later database

  to the current version, dbrev 32.  Dbrev 23 is circa Nov 2016.  If

  the database is older than that, it has to be upgraded with an older

  version of HB first, then again with the current release.  This is

  now detected and explained when a very old database is accessed

  instead of giving an import error.

#2249 Apr 12, 2019 - expires Jul 15, 2019



* B2, WebDAV: fix warning message from updated library

* high CPU usage during file uploads on old versions of OSX


- B2, WebDAV: a library used by HB was updated in #2247 and displayed

  a long warning on some system about SNIMissing.  This was an HB build

  problem and has been fixed.

- Mac: on old versions of OSX like 10.6.8, Snow Leopard, HB has high

  CPU usage during file uploads.  This is apparently an OS issue (a

  "busy wait") that doesn't occur on newer versions of OSX like 10.9,

  Mavericks.  To prevent this, add timeout None to dest.conf.

  However, doing this can cause HB to hang if there is a problem

  communicating with an unreliable remote, so it isn't a perfect


#2247 Apr 11, 2019 - expires Jul 15, 2019



* recover: fix recent traceback on non-sharded backups

* retain: fix recent traceback when packing caused by --secure

* rm: fix recent traceback with --secure

#2243 Apr 9, 2019 - expires Jul 15, 2019



* rm: new --secure option to force removal of confidential data

* backup: add -F to read file lists with paths to backup

* backup: minor change in deleted file handling

* backup: samples and shards command line and -F paths

* selftest: allow selftest -v2 on read-only database

* recover: works with sharded backups

* compare: add --sample option

* backup: fix shards backing up each other (major performance bug)

* dest: show, clear, load, unload, setid, etc. work with shards

* get: fix ACL-related cross-platform restore bug

* db upgrades only ask for database passphrase once


- rm: when files are deleted from the backup, the "formula" for

  reconstructing the files is immediately deleted from the database,

  making it impossible to restore the files using HashBackup.

  However, the files' encrypted data blocks, stored in arc files, are

  not deleted until the next pack operation.  This may make it

  difficult to say "yes, your confidential data has been removed".  A

  new --secure option repacks all affected arc files to remove

  deleted blocks, ignoring the pack-xxx config settings and limits.

- backup: a new -F option specifies sources of pathnames for files to

  backup.  This can be used to avoid scanning through an entire

  filesystem to find new, modified, or deleted files.  There can be

  more than 1 source after -F.  Each source is:

  a. pathname of a file containing pathnames to backup, 1 per line.

     Blank lines and lines beginning with # (comments) are ignored.

     Pathnames in the file can either be individual files or whole

     directories to save.  If a pathname doesn't exist in the

     filesystem, it is marked deleted in the backup.


  b. pathname of a directory, where each file in the directory

     contains a list of pathnames to backup as above

  The backup command line can still have pathnames of files to backup,

  as before, but -F and file sources must come after them.

- backup: previously, if a pathname was:

  - listed on the command line

  - not present in the filesystem

  - and had a "current" version in the backup

  an error message was displayed that the file couldn't be found but

  the file stayed in the backup.  Now the file is marked deleted in

  the backup.  This allows sites to add deleted pathnames to a -F file

  list to get them removed from the backup without scanning the


- backup: previously, command line pathnames that were files rather

  than directories were saved in every shard.  For example, backing up

  *.c would save all of the C files in every shard.  Now these are

  divided up among the shards so each file is only saved in 1 shard.

  Pathnames listed in -F files are also divided among shards.

  Command line and -F pathnames are also sampled, so backing up *.c

  --sample 10 will only save 10% of the C files.

- selftest: if a backup is on a read-only filesystem or the database

  is set to read-only, selftest no longer fails at the end with a

  traceback when the incremental selftest markers are updated.

- recover: now works with sharded backups.  The recovery procedure is

  similar: copy key.conf and dest.conf to an empty directory, run

  recover.  This recovers the main backup directory.  Then run recover

  again (recover tells you this) to recover all shards in parallel.

- compare: backup's --sample option was added to compare so that

  compare works with sampled backups

- backup: shards were mistakenly backing up each others' backup

  directories.  This caused major performance problems if

  cache-size-limit was not set and all backup files were local.

- dest: all of the dest subcommands (clear, load, setid, unload, etc)

  work with sharded backups.

- get: restoring a Mac/OSX or BSD backup to a Linux system could cause

  a traceback when ACLs are used:

    Traceback (most recent call last):

      File "/", line 149, in <module>

      File "/", line 2080, in main

      File "/", line 846, in restoreobj

      File "/", line 616, in restoredir

      File "/", line 932, in restoreobj

    NameError: global name 'acl' is not defined

  Now this fails in a better way: since ACLs are not portable across

  platforms, HashBackup issues this error on every file or directory

  with an ACL and then continues the restore:

     Unable to set ACL: this system does not support ACLs: (pathname)

  File data is  still restored even if the ACL cannot be set.

- when an older backup is accessed with a newer version of Hashbackup

  an automatic database upgrade might be necessary.  If the key has a

  passphrase, it is only queried once now instead of once for every

  version upgrade.

#2224 Apr 3, 2019 - expires Jul 15, 2019



SHARDING IS EXPERIMENTAL and should not be used in production because

mount & recover do not work yet.

* S3: class keyword accepts any value

* init: new --shards option

* config: new shard-output-days option

* most commands can be sharded

* backup: --shard option no longer needed

* backup: display correct shard percentage

* config: admin passphrase no longer on the command line

* HB_ADMIN_PASSPHRASE environment variable

* config: sets config options in all shards

* dest.conf: {shard} replacement

* compare: fix recent traceback

* get: fix recent traceback with --delete


- Amazon S3: the class keyword previously required a value of either

  standard or ia.  It still accepts those, but also any other value

  used here is passed directly to AWS, so you can use onezone_ia for

  example.  The value is uppercased as required by AWS.  Thanks Alex!

- init: a new --shards N option will create backup subdirectories s1

  to sN for each shard beneath the main backup directory.  Sharding

  allows multiple backups to run in parallel, automatically dividing

  files between shards.

  Shards are setup so that they share the same key.conf, inex.conf,

  dest.conf, and cacerts.crt files, located in the main backup

  directory.  Modifying any copy of these files takes effect in all


  dest.conf is somewhat special because each shard needs a separate

  directory to store its backup, so whenever {shard} is seen in

  dest.conf, it is replaced by the shard number.  See further down for

  more details.

  A new directory "sout" is created for sharded backups.  It saves

  the output from each sharded command and is displayed when each

  shard finishes.

  Thanks Ian for inspiring sharded backups.

- config: a new config option, shard-output-days, is the number of

  days to keep shard output in the sout directory.  The default is 30

  days.  Set to 0 to keep all shard output.

- shards: in this release, most commands on a sharded backup directory

  will automatically start a process for each shard.  Output from each

  process is buffered in a file and displayed in order after each

  process finishes.  The log prefix works (hb log backup ...).  There

  are still rough edges:

  -- output should not be buffered in memory so that files in sout can

     be checked for job progress or problems

  -- commands that need input may halt or ask for each shard

  -- the prompts for input are not always displayed

  -- mount & recover do not work, recover being a necessity

- backup: the --shard option is no longer needed and has been removed.

  There are now 2 ways to start sharded backups:

  1. HB will start a process for each shard if the backup directory

     was created with the --shards option to init.

  2. To manually start shard backups, eg, with GNU parallel, the shard

     is specified with -c backupdir/sN.  For example, if the init

     command was -c backupdir --shards 5, then the backup command for

     shard #1 would use -c backupdir/s1, etc.

- backup: when displaying the percentage of files being saved, backup

  would display 50% for the 1st shard then 100% for the 2nd shard in a

  2-shard setup.  It should have displayed 50% for each shard.

- config: previously, new admin passphrases were entered on the

  command line.  This is not secure since command lines can be

  observed.  Now the new admin passphrase is asked for with getpass(),

  echo is turned off, and it is entered twice for verification.

- HB will look in the environment variable HB_ADMIN_PASSPHRASE when an

  admin passphrase is needed.  This should NOT be used like this:

     $ HB_ADMIN_PASSWORD=jim hb rm ...

  because that command is visible to other users.  Instead, do this:

     $ HB_ADMIN_PASSWORD=`cat pwfile` hb rm ...

  where pwfile is secured and contains the password, or export the

  environment variable in .profile and make sure it is protected.

  Storing passwords is usually a bad idea, but if it's necessary for

  automation, this is one of the less bad ways to handle it.  On

  modern Unix systems, environment variables are not visible to other

  non-root processes.

- config: config changes made to the main backup directory are

  propagated to all shard subdirectories.  Trying to change the config

  settings on an individual shard will throw an error.

- dest.conf: for sharded backups, each shard must have its own private

  storage area similar to the way each host must have its own private

  storage area.  This is usually done by using the Dir keyword with a

  destination.  For sharded backups, the special string {shard} in

  dest.conf is replaced by the shard number.  For example: 

    Dir hostname/s{shard}

  will store backups in hostname/s1, hostname/s2, etc.  {shard} is

  replaced anywhere it occurs, not just on Dir keywords, so it can be

  used with rsync, rclone, and other destinations that do not use the

  Dir keyword.

- compare: fixed a traceback (since #2197)

- get: fixed a traceback when --delete was used (since #2197)

#2199 Mar 24, 2019 - expires Jul 15, 2019



* build with updated Python 

* build with updated OpenSSL

* backup: fix 2 obscure --maxtime bugs


- HashBackup is written in Python.  This release updates the Python

  environment used to build HB.  No performance changes have been

  noticed.  The main reason for the Python update is to be able to use

  newer versions of OpenSSL.

- HashBackup uses OpenSSL for secure network connections.  Previously

  OpenSSL was dynamically loaded at runtime on Mac OSX and statically

  loaded on Linux and FreeBSD.  Now it is statically loaded on all

  platforms and can be updated independently of OS updates, allowing

  HB to maintain security when running on older systems.

- backup: if --maxtime is used and one ever-changing file took longer

  than maxtime to backup, HB would get stuck backing up this file

  instead of proceeding to the next, ie, it started on the checkpoint

  rather than after the checkpoint.  From a thought experiment and

  verified with a test case.

- backup: the inex.conf file is normally saved in every backup, but

  sometimes if --maxtime was exceeded, inex.conf was not saved.

#2197 Mar 20, 2019 - expires Jul 15, 2019



* database software upgraded for a 10% speed increase

* backup: --sample and --shard up to 2x faster

* Amazon S3: dualstack (IPv4 + IPv6) endpoints removed


- backup: the --sample and --shard options select some files for

  backup and reject others.  Rejecting files is now more efficient:

  for a test directory on SSD of 10K files, --sample 10 is twice as

  fast.  The selection method changed again so this release will not

  sample or shard the same as the previous release.  This should be

  the last time it changes.

- Amazon S3: in #2139 the S3 endpoints were changed from IPv4 to the

  "dual stack" endpoints to enable both IPv4 and IPv6.  For some

  regions this worked fine, but for others it caused a 400 Bad

  Request.  There doesn't seem to be a pattern of where it worked and

  where it failed.  For now this change has been reverted to IPv4

  endpoints.  If you understand why some regions worked and some

  didn't, please send an email.  Thanks Scott!

#2194 Mar 14, 2019 - expires Jul 15, 2019



* backup: new --sample option simplified and more accurate

* backup: new experimental --shard option


- backup: as an experiment, the --sample option had 2 forms: either a

  single percentage, meaning the percentage of files to sample, or a

  percentage range used to partition the backup.  This option now only

  takes a single percentage and the range option has been removed.

  There was also a bug in the sampling feature, where the sampling

  could sometimes be very uneven.  For example, in a 100-file

  directory, backing up with --sample 1,33 should have saved around 33

  files, but instead only saved 2.  --sample 67,100 should also have

  saved around 33 files, but instead saved 66.  In this release the

  sampling is more uniform.

- backup: EXPERIMENTAL: a new --shard option replaces the sample range

  option.  It takes 2 integers, "my shard id" and the total number of

  shards, separated by a slash.  For example, --shard 1/3, --shard

  2/3, or --shard 3/3.  This allows partitioning a huge backup into

  independent sections and running the backups in parallel using

  separate backup directories.  This option is only for

  experimentation and evaluation and is not likely to stay in this

  form as it becomes more integrated into HashBackup.  Feedback is

  welcome from any experiments with sharding a large backup.

#2189 Mar 9, 2019 - expires Jul 15, 2019



* rm/retain: fix hang condition during packing


- rm/retain: a couple of weeks ago, a fix was made to the packing code

  to avoid repacking the same set of small files over and over.  This

  had an unintended side effect that caused a hang when packing large

  files.  This new fix will hopefully accomodate both situations.

#2188 Mar 8, 2019 - expires Jul 15, 2019



* backup: fix minor sampling bugs


- backup was incorrectly displaying a message "Sampling 1% of backup"

  when no --sample option was used.  It was doing a regular backup so

  no backups were missed because of this bogus message.

- backup: --sample 10,5 was displaying a negative sample percentage

  instead of displaying an error message.

#2186 Mar 5, 2019 - expires Jul 15, 2019



* get: new --splice option to combine local and remote data

* get: delete .hberror file before restoring a file

* inex.conf: exclude /dev/fd on OSX

* selftest: remove new sparse file warning

* backup: new --sample option


- get: in the previous release, get used whole local files to assist

  restores.  This works very well for restoring a user directory with

  lots of files but didn't help much with restoring a large file that

  had changed, such as a VM image.

  With the new --splice option, get can combine data from parts of

  local files with remote backup data to restore files, sometimes

  called incremental restore.  This can be done even if the local

  files are changing, for example, a running VM image.  For very large

  restores, splicing can use temp space in the backup directory equal

  to the size of the restore, and it requires reading the local files.

  Splicing can reduce the amount of data downloaded significantly and

  is often faster than other non-spliced restores.  Thanks Jacob and


- inex.conf: /dev/fd are open file descriptors and should be excluded

  from the backup as they generate errors if used with -X (cross

  filesystems).  Thanks Harry!

- selftest: a new warning was recently added about sparse files with

  variable-sized blocks.  However, OSX and ZFS sometimes compress

  files and it causes these warnings, even though the backup is fine,

  so the warnings have been removed for now.  Thanks Alex & Robert!

- backup: a new --sample option can be used to sample huge backups.

  This is especially useful with the simulated-backup config option to

  determine the best settings for a huge backup, though it also works

  with non-simulated backups.  Directory records are never skipped,

  which may skew the results slightly.

  There are two forms for the --sample option:

  The first form, --sample P, is a percentage between 1 and 100 of how

  many files to backup.  If 5% of files are saved first, then 10% are

  saved 2nd, the same set of files are saved as if 10% were saved the

  first time.  In other words, each run is not a true random sample.

  This allows testing of incremental backups.

  The second form, --sample L,H gives a range of files to sample,

  where L and H must be 1-100.  For example, --sample 1,10 samples the

  first 10% of files, --sample 11,20 samples the next 10%, and so on,

  with --sample 91,100 sampling the last 10%.  This allows very large

  backups to be partitioned and done in parallel with multiple copies

  of HB, each using its own backup directory.  In the future, this

  concept may be expanded to appear to the user as a unified backup.

  Thanks Ian!

#2164 Feb 26, 2019 - expires Jul 15, 2019



* database upgrade to dbrev 32

* get: use local files during restore

* get: new -v3 and -v4 options

* get: new --no-mtime option

* get: restores can be continued

* get: new --no-local option

* get: show download plan with --plan

* get: add .hberror file suffix on restore errors

* get: fix delay when restoring directories with --orig

* rm/retain: don't repack the same files


- this release does an automatic database upgrade when any HB command

  is used.

- get: uses local copies of backup files to assist restores.  This has

  several benefits:


  -- downloads less

  -- restores faster

  -- uses less RAM

  In this initial release, local data is matched for entire files.

  This may be improved in future releases, for example, to allow

  restoring a large VM image to an earlier state using both local files

  and downloaded data.  To compare restore plans without doing a

  restore, use get --plan with and without the --no-local options.

  Thanks Ben!

- get: two new levels are added to -v (the default is still -v2):

  -v3 displays messages when local files are not used for restore.

      This can happen because the file is missing or the timestamp,

      size, or hash does not match the file being restored

  -v4 displays messages when local files are used for a restore

- get: before using a local file for restore, HB checks that the

  local file's size and mtime match the backed-up file's size and

  mtime.  This is a not a 100% guarantee (but maybe 99%) that the

  local file data is identical to the backed-up file data because

  mtime can be set to arbitrary values with the utime/utimes system

  calls.  The new --no-mtime option causes HB to compute a strong file

  hash for local files and compare that to the backup file's saved

  hash before believing mtime and using a local file.

- get: if a restore is restarted, get will restart much faster and

  does not download data it has already restored if files are restored

  to the same location.  This also uses the mtime + size check for

  identical files.  The --no-mtime option will verify the file hash if

  there is a possibility mtime has been altered.  This is unlikely,

  but may be necessary for extremely high security restores.

- get: a new --no-local option disables the use of local files.  This

  can be used to compare restores with/without using local files and

  also can be useful if local files might change during a restore.

  Changing local files could cause a restore using local files to fail

  with a hash error.  If that happens, the restore can be repeated.

- get: download information is now displayed with the --plan option.

  Previously the -i option (confirm restore) had to be used.

- get: if get detects file errors during a restore, it adds a .hberror

  suffix to the restored filename to make it clear there is a problem.

  This is better than deleting the restored file because partial data

  is often better than none.

- get: a poor database query could cause directory restores to be

  slow with --orig, especially on large backups.

- rm: a bug in #2139, released a few weeks ago, could sometimes cause

  the same small files to get packed repeatedly

#2154 Feb 16, 2019 - expires Apr 15, 2019



* bump backup expiration date to July 15th

* database upgrade to dbrev 31

* backup: fix dedup enabled with -D0

* backup: -B option takes arbitrary block sizes

* config: new block-size config option

* backup: new -V command line option for variable block sizes

* config: config option block-size-ext accepts -V

* get: save and re-use restore plans

* get: new --plan option to only plan a restore

* get: new --no-sd option to disable selective download


- this release does an automatic database upgrade when any HB command

  is used. The upgrade prevents access by earlier HB releases since

  they would not recognize new variable block sizes.

- backup: #2139, released about a week ago, fixed a traceback with

  too-small dedup table size but also enabled dedup even with -D0.  If

  dedup was previously disabled, backup might show this warning:

    Backup block size change, full backup: <pathname>

  This occurred with versions #2139-#2146 because dedup was mistakenly

  enabled.  It may occur again with this version because the bug is

  fixed and dedup is again disabled.

- backup: HB splits files into either fixed or variable-sized blocks

  during backup.  Smaller block sizes dedup better but generate more

  blocks so the block accounting overhead in hb.db is higher.  

  The -B option forces HB backup to split files into fixed-size

  blocks.  Previously only a handful of block sizes were supported,

  doubling from 4K to 16M.  Now any block size >= 128 bytes can be

  used for fixed-block backup.  This is useful for proprietary file

  formats that may have an unusual fixed block size (a Prime

  minicomputer disk image uses 2080-byte blocks for example), and to

  backup huge files with large block sizes, reducing the amount of

  block-tracking overhead in hb.db and making it faster to remove

  files and plan restores.  Large block sizes > 4MB are usually a bit

  slower for backup because there is less concurrency.

  NOTE 1: for now there is a limit of 2GB on the block size.  Ensure

  there is enough RAM for huge block sizes.  A reasonable estimate is

  (n+2)*blocksize, where n is the number of CPUs (or -p option). Using

  -p0 to disable multi-threading will use less memory and less CPU but

  is slower.

  NOTE 2: blocks are never split across arc files. If the block size

  is greater than config option arc-size-limit, arc files may be

  larger than expected.

- config: if HB decides to backup a file with variable-sized blocks,

  it previously used a variable block size of 32K, with an average

  size of 48K.  Now this is configurable with the config option

  block-size.  The possible values are: 16K 32K 64K 128K 256K 512K 1M.

  When the block size is increased, dedup becomes less effective.  The

  advantage of larger blocks is that there is less block tracking

  metadata: hb.db and the dedup table are both smaller.  Backup may

  run slightly faster with larger variable block sizes, though not

  always and not by a lot - maybe 8%.

  Larger blocks are good for backing up lots of huge files, or if

  there is a concern of exceeding 4 billion 48K blocks (211TB).  The

  dedup table currently is limited to 4B blocks for efficiency (though

  that limit is easy to change).  Using -V64K would double the backup

  size limit to 422TB, -V128K doubles it again to 844TB, and the max

  of -V1M would have a limit of 6.7PB.

  Different block sizes can be used with the same backup directory by

  using several backup commands with different block sizes.

  NOTE 1: if the default block size is changed, the next backup will

  save changed files without dedup if the new block size doesn't match

  the file's previous block size.  A warning is displayed.

  NOTE 2: If you are already using -B, changing this config option

  will have no effect.

- backup: a new -V option specifies the variable block size.  This

  overrides the block-size config option for a single backup.

- config: the block-size-ext config option sets the backup block size

  for files based on the file extension.  This now accepts both -B and

  -V.  For example, the value:

     -B4m .mov .avi -V64K .sql -B23K .xyz

  sets a large fixed block size for large video files that will not

  dedup well (except if the file is duplicated), a 64K variable block

  size for .sql files, and a 23K fixed block size for .xyz files.

- get: when cache-size-limit is set, some archive files must be

  downloaded for a restore.  Creating a plan for the optimum download

  of remote data is rather complex and can be on the slow side,

  especially for a very large restore with a lot of small blocks, for

  example, a large VM image saved with 4K blocks.  In this release, HB

  saves restore plans so that if there is a problem in the restore and

  it has to be tried again, the restore plan can be read in a few

  seconds rather than computed from scratch.  Keep in mind that a

  restore plan is dependent on the files being restored, so a plan can

  only be reused if the exact same files are being restored.

- get: a new --plan option creates a restore plan for a list of files

  but does not actually restore anything.  The restore plan is saved

  in the backup directory.  Using get with --plan can be done after

  the daily backup for example, so that if a restore is needed, the

  plan is already available.  --plan is especially useful for very

  large restores as in a disaster recovery situation or complete

  restore of a very large VM image saved with a small block size.

  Another advantage is that reading a saved plan uses around 40% less

  RAM than creating the plan from scratch.

- get: a new option --no-sd disables selective downloads during a

  restore, ie, entire arc files are downloaded.  Selective download

  (SD) allows HashBackup to read parts of arc files and is very useful

  for small restores.  SD also minimizes the amount of data downloaded

  and the local cache required for a restore.  However, HB needs more

  time and RAM to compute a restore plan with SD.  If you have fast

  and cheap access to remote archives and plenty of local disk space

  for the cache, it may be faster to use --no-sd to disable the more

  complex selective download plan.

#2146 Feb 7, 2019 - expires Apr 15, 2019



* better config error messages

* rm: when packing, handle non-selective download better


- config: when an error occurred, such as trying to assign a value

  'abc' to cache-size-limit, the error message was sometimes vague:

     int value is required

  Now it is much more detailed:

    Error converting config keyword cache-size-limit: could not convert string to float: abc

- rm: recent changes to the packing algorithm require knowing how much

  will have to be downloaded to estimate the cost of packing.  This

  estimate was wrong for destinations that don't support selective

  download (rclone and rsync).


#2143 Feb 5, 2019 - expires Apr 15, 2019



* retain: minor bug fix

* change KB, MB, GB from *1024 to *1000 for HB inputs

* B2: better handling of application key restrictions

* B2: protecting backups against accidents and ransomware


- retain: fix traceback that could occur when not packing:

    TypeError: 'NoneType' object is not iterable

  Thanks to the automated email bug reporter.

- previously, when HB displayed a number with a suffix like KB or MB,

  it meant *1000 or *1000000.  If a file was 123456 bytes, HB would

  display the size as 123K. But when a suffix was typed in, either as

  a config option, command line argument, or dest.conf keyword, HB

  used the older, computer geeky multiplier of 1024, so 123K meant

  123*1024*1024 or 125952 bytes (also called 123KiB).  If displayed,

  that same number would be shown as 125KB, not 123KB.

  Obviously that was confusing, so now KB, MB, etc all mean *1000 for

  both input and output, with these exceptions that remain *1024:

  -- the -B backup command line option; this is a disk block size so

     -B4K still means 4096 bytes (usually the size of a disk block)

  -- the block-size-ext config option, for the same reason

  -- the -D backup command line option; this is the size of the dedup

     table.  If the units were changed, it would cause large dedup

     tables to rebuild.  And since it is specifying an amount of RAM

     to use, *1024 makes more sense.

  -- the dedup-mem config option, for the same reason

  Be aware that if you previously had an arc-size-limit of 100MB (the

  default), that meant 100*1024*1024 or 104857600 bytes.  That's why

  HB often displayed arc file sizes as 104MB.  Now the limit will

  change to 100000000 bytes, so new arc files might display as 99MB.

  If you set a pack-download-limit of 100MB, you may get warnings on

  old arc files because they are larger than the download limit.  To

  avoid this, set the download limit to 105-110MB.  Or, the error

  might go away on its own as more space is freed up in the arc file.

- B2: several situations with application keys are handled better:

  -- if the B2 application key is restricted to a specific bucket but

     that bucket no longer exists, an informative error is displayed

     rather than a 401 error

  -- the bucket keyword in dest.conf is optional if the B2 application

     key is restricted to a bucket

  -- if the B2 application key is restricted to a specific bucket and

     that bucket isn't the one listed in dest.conf, an informative

     error is displayed instead of a 401

  -- if a B2 application key is restricted to a specific prefix and

     the dir keyword is not present in dest.conf, the B2 prefix is

     used as the dir keyword

  -- if a B2 application key is restricted to a specific prefix, it

     must be an initial substring of the "dir" keyword in dest.conf.

     For example, if the B2 prefix is foo (this will give a warning

     about a missing slash), the dir keyword must start with foo, or

     no accesses will work.  Previously this would cause a 401 error,

     but now an informative error is displayed

  -- if a bucket doesn't have lifecycle rules, HashBackup tries to set

     a rule "keep only the latest version".  Previously it was a fatal

     error if this failed, but now HB first verifies that it has

     permission to set rules before even trying

  -- if a B2 application key doesn't have deleteFile permission, HB

     will hide files instead of deleting them.  Note: un-hiding files

     in the future requires deleteFile permission.

- B2: there is always concern about a backup being deleted, encrypted,

  or corrupted by an attacker, ex-employee, or by accident.  Several

  new features address this concern with the B2 storage service:

  1. a B2 application key should be used that does not have the

  deleteFiles permission.  Then even if the B2 credentials are

  compromised, an attacker cannot delete backup files.

  2. the lifecycle rules on the B2 bucket should be set to preserve a

  number of days of history - enough to make sure that a backup

  problem is noticed before the history is deleted by the B2 service.

  3. to revert to a previous version of the backup:

     a. use the B2 web site to delete the history of the DESTID and

        dest.db files back to the desired recovery point

     b. create a new local backup directory, with key.conf and

        dest.conf copied from a safe copy

     c. run hb recover -c newbackupdir to recover the local backup


     d. you can now continue regular backups and restores

  While your backup has been recovered, it is in an unusual state

  because the current version of HB files are still hidden on the B2

  web site.  This needs to be addressed in a future update by rolling

  back the entire B2 history to the recovery point.

#2139 Jan 31, 2019 - expires Apr 15, 2019



* config: new option pack-download-limit replaces pack-remote-archives

* rm & retain: improved arc file packing and combining

* rm & retain: pack-combine-max is obsolete, replaced by arc-size-limit

* minimum local arc cache size might increase

* retain: don't modify database or pack arc files with --dryrun

* rclone: don't fail when -v is used with --args

* backup: handle -D with very small number

* retain & rm: fix rare hang during packing

* stats: disable "file bytes currently stored"

* mount: fix race bug with destinations not supporting selective download

* S3: support new S3 regions & IP4/IP6 endpoints

* rclone: fix dest verify error loop on small RAM systems


- config: a new config option pack-download-limit limits the amount of

  data downloaded for packing in a single run of rm or retain.  The

  default is 950MB.  Several users asked for this option, especially B2

  users since Backblaze gives a 1GB free daily download allowance.

  Thanks Vincent!

  NOTE: in HB config, MB means MiB or 1024*1024, so 950MB is really

  950 * 1024 * 1024 = 996147200 bytes.  The limit can be set to

  1000000000 for exactly 1 billion bytes, but there should be a little

  slop in the download limit to account for downloading DESTID (32

  bytes), etc.  (Update: since #2143, config inputs are multiplied by 1000,

  so this is no longer true.)

  The pack-remote-archives True/False config option has been removed.

  To prevent all downloading of remote arc files for packing, set

  pack-download-limit to 0.  For "unlimited" downloading for packing,

  set pack-download-limit to a high limit like 1TB.

  Use hb config -c backupdir pack-download-limit 5GB to set the limit

  to 5GB.  Packing of remote arc files stops when the download limit

  is reached and will continue on the next run of rm or retain.

  It is recommended that pack-download-limit is not set to zero since

  over time, this can cause slower restore times as "holes" are

  created in older remote arc files by rm and retain.  Instead, raise

  pack-percent-free to something like 95, meaning an arc file must be

  95% free before it is packed.  This will prevent nearly all

  downloading except very small arc files (less than pack-combine-min)

  and very inefficient arc files.

- rm & retain: the packing algorithm has been improved and now packs

  the "best" arc files first, taking into account both the amount of

  data deleted in the arc file and the download bandwidth required to

  retrieve the file.  It also combines arc files more often while

  packing, reducing the total number of arc files and improving

  restore times by avoiding tiny retrieval requests to remote backup


- rm & retain: during packing, rm & retain combine very small arc

  files (less than pack-combine-min, default 1MB) into larger arc

  files.  When this was first implemented, HashBackup didn't support

  selective download and had to download entire arc files to retrieve

  small blocks of data. The config setting pack-combine-max was used

  to limit how large the new combined arc files could be.

  Now that HashBackup supports selective download, it is not as

  important to limit the size of a combined arc file and the

  pack-combine-max config option has been retired.  Instead, the

  arc-size-limit config option controls the maximum size of a combined

  arc file as it does with backups.

- previously the minimum local arc cache (cache-size-limit) was:

      2 * (arc-size-limit + 10MB)

  or roughly, 2 arc files.  But if arc-size-limit was set higher in

  the past, say 1GB, arc files were created, and then arc-size-limit

  is set lower to 100MB, this could cause problems during remote

  packing if cache-size-limit was set very low because the larger arc

  files would not fit into the cache.  Now instead, the minimum cache

  size is:

      2 * max(arc-size-limit + 10MB, largest existing arc)

  This might require a bit more local cache space if the backup

  contains arc files larger than arc-size-limit.

- retain: --dryrun could sometimes modify the database slightly,

  causing a database upload.  It also could pack arc files, which

  should not happen with --dryrun.

- if -v or --verbose was used with --args, rclone failed

  because the script also adds -q and the two can't be used

  together.  Now doesn't add -q if either -v or --verbose is

  used.  Thanks Chris!

  NOTE: get from 

- backup: -D with a very small number (like 1) would cause a traceback:

    OverflowError: can't convert negative value to unsigned PY_LONG_LONG

  Now instead, a small 4K dedup table is created - typically 1 disk page.

  Thanks to the automated email bug reporter.

- rm & retain: this somewhat unusual set of circumstances:

  -- cache size limit is set

  -- all destinations are unavailable during backup

  -- backup creates more than cache-size-limit new arc data

  -- rm or retain is executed & destinations are now available

  -- remote packing is enabled

  -- there are remote arc files that need packing

  could cause rm/retain to hang while trying to download the arc files

  to pack.  Running another backup (or dest sync) would usually

  correct the problem, but it's now fixed in rm/retain.  Thanks Frank!

- stats: the "file bytes currently stored" statistic could become

  negative if hard-linked or sparse files are removed from the

  backup.  This statistic is no longer displayed until it can be

  corrected.  Thanks Frank!

- mount: when reading from the backup mount via a destination that

  doesn't support selective download (like rclone or rsync), entire

  arc files have to be downloaded.  There was a race condition causing

  the same arc file to be downloaded more than once.  If the

  destination had multiple workers, they could "step on each other"

  while trying to download the same file, causing various errors.

  Thanks Ben!

- S3: both IP4 and IP6 S3 endpoints are supported as well as some new

  S3 regions:  ap-northeast-3, cn-northwest-1, and eu-north-1.

- dest verify: on systems with small RAM and swap (the test system had

  512MB of RAM and no swap), a dest verify command with an rclone

  destination could display an endless loop of errors like this, where

  "drop" is the destination name:

drop[26482]: error reading ls output from <backupdir>/drop.lsout.tmp: 

drop[26489]: error reading ls output from <backupdir>/drop.lsout.tmp: 

drop[26482]: error reading ls output from <backupdir>/drop.lsout.tmp: 

drop[26489]: error reading ls output from <backupdir>/drop.lsout.tmp: 

drop[26482]: error reading ls output from <backupdir>/drop.lsout.tmp:

  The problem was an inability to allocate 1GB of RAM for a read

  buffer that didn't need to be nearly that large.

#2125 Dec 8, 2018 - expires Apr 15, 2019



* bump expiration date

* Box is depreciating WebDAV access on Jan 31, 2019

* better error if destname in dest.conf has no value

* backup: handle I/O errors better


- Box ( is depreciating WebDAV access on Jan 31, 2019:

  Box does support FTP access for enterprise accounts, but they say in

  bold type that they do not recommend using this on a regular basis.

  If you have been using HashBackup with Box, you may want to migrate

  your backup to another storage service or reconfigure dest.conf to

  use the rclone destination.  HB does not support the native Box API,

  so after the deadline, the only way to access Box storage would be

  to use HB's rclone destination.

  See to configure rclone for Box.

  See to configure

  HashBackup with rclone.

- dest.conf: if the destname keyword had no value, an error:

    AttributeError: 'NoneType' object has no attribute 'lower'

  was displayed.  A better error message is now displayed

    Destination name required at line x in dest.conf

  Thanks to the automated email bug reporter.

- backup: when doing a multi-threaded backup, specific I/O error codes

  were handled correctly, but unexpected I/O errors caused the backup

  to abort.  NTFS-3g on Linux sometimes raises error 75 "Value too

  large for defined data type" on compressed NTFS files that have

  garbage at the end, causing HB to abort.  A Linux "cp" command gets

  the same error, and it happened with multiple files, so this is

  likely a bug in NTFS-3g.

  Now when a read error occurs, an error message is displayed for that

  file, the file is flagged as partially backed up, and the backup

  continues.  The next backup will again try to save the file,

  probably get the same error, etc.  To permanently correct the NTFS

  file you can make a copy and rename it to the original though it's

  not clear whether this results in data loss.

#2118 Aug 22, 2018 - expires Jan 15, 2019



* bump expiration date

* B2: support application keys

* B2: better logging


- Backblaze B2 recently launched application keys, a feature that

  allows one B2 account to have multiple keys, each with its own

  permissions.  Previously a B2 account had only an account id and

  master key, which HB supported with the accountid and appkey

  keywords in dest.conf.

  Unfortunately, calling these new keys "application keys" is a bit

  confusing since that is also used earlier to describe the master

  application key.

  In dest.conf, B2 storage accounts can now be accessed in two ways:

  1. Use the accountid keyword with the B2 account id, and the master

     application key with the appkey keyword.  The account id is a

     12-digit hex string.

  2. Use the B2 website or B2 Python utility to create an application

     key and set the permissions (capabilities) of the key.  You will

     get back a key id and application key string.  Use the keyid and

     appkey keywords in dest.conf with these values.  The keyid is a

     25-digit hex string.  The appkey is 31 symbols that (for now)

     seems to start with K.

  IMPORTANT: when restricting a B2 application key to a certain prefix

  for use with HashBackup, make sure that the prefix you use when

  creating the application key has a trailing slash.  If a trailing

  slash is not used, then for example, a prefix of "a" would allow

  access to any files or pseudo-directories in the bucket that begin

  with the letter a, but you may have intended to restrict to a single

  pseudo-directory named "a".  In this case HashBackup will display a


  b2(b2): warning: B2 restricted key prefix matches dir keyword but without trailing slash: (prefix)

- B2: when the debug keyword is used, HashBackup now dumps the JSON

  response on every request for easier troubleshooting.  Previously it

  only dumped the JSON response in some situations.

#2117 June 7, 2018 - expires Oct 15, 2018



* ssh: reuse ssh connections for faster performance


- the ssh destination does one connect per worker instead of one

  connect per file transfer.  With a local ssh server (3ms ping time),

  this doubles the speed of remove and short file transfers since

  connection overhead is eliminated.  With remote ssh servers, the

  speed improvement is even more visible because remote ssh

  connections can sometimes take up to a few seconds to finish.

  Thanks Tadas!

#2116 June 3, 2018 - expires Oct 15, 2018



* rm: use correct cache size


- rm: in yesterday's version, rm wasn't using the correct cache size

  if cache-size-limit was less than 2 * arc-size-limit, eg, zero.

  This limited combining arc files more than necessary.

#2115 June 2, 2018 - expires Oct 15, 2018



* bump expiration date

* ssh: support selective download and upload rate limits

* ssh: remove unnecessary server interaction on upload

* backup: new config option block-size-ext

* backup: reinstate disk full check

* get: fix restore bug when arc files >= 4GB

* get: clean last few arc files from cache

* dest.conf: better error message on workers keyword

* config: special handling for arc-size-limit 4GB

* rm/retain: fix hang when cache is limited

* config: allow setting pack-combine-min to 0


- ssh/sftp: previously, the ssh destination could have a type keyword

  (in dest.conf) of either ssh or sftp, and they were exactly the

  same: both used only the sftp command to talk to the remote storage


  In this release, a type of sftp is the same as before, using only

  sftp.  This is sometimes desirable for security purposes, if the ssh

  server is configured to only allow certain commands like sftp.

  A type of ssh now has 2 new features not supported with sftp: 

  * files are sent using the local scp command instead of sftp, so

    upload bandwidth can be limited with the rate keyword in

    dest.conf.  The sftp command does not support upload rate

    limiting.  scp may also be slightly faster because it doesn't do

    buffer acking.

  * selective download is supported, allowing HB to download only the

    data it needs from remote arc files instead of downloading entire arc

    files.  This is a much more efficient use of network bandwidth and

    local cache space, especially when large arc files are used and a

    restore is comparatively small.

- ssh: an unnecessary ssh operation was removed from file uploads

- backup: a new config option, block-size-ext, sets the backup block

  size for specific filename extensions (suffixes).  For example:

    hb config -c backupdir block-size-ext '-B4M mov,avi -B1M mp3'

  Commas and dots are optional.  Thanks Jacob!

- the disk full check that was recently removed (Feb) is back, but

  with a slight twist so that it doesn't interfere with draining the

  cache like it formerly did.  HashBackup will halt when a new arc

  file is created if there is less than 2 x arc-size-limit bytes

  available in the local backup directory.  This allows enough room

  for 2 arc files: one transmitting while another is created.

- get: if cache-size-limit is set, an arc file needed for a restore is

  not local, the arc file is >= 4GB, and the portion of the arc file

  needed for the restore is after 4GB, get would fail like this:

    error: 'I' format requires 0 <= number <= 4294967295

  The backup is fine; this was a bug in the planner.  Thanks Jacob!

- get: when cache-size-limit is set, get could sometimes leave a few

  arc files in the cache that should have been removed

- dest.conf: if the workers keyword was used with no value it caused

  an uninformative traceback error:

    TypeError: int() argument must be a string or a number, not 'NoneType'

  Now it gives an informative error message:

    int() argument must be a string or a number, not 'NoneType';

    Expected integer for dest.conf keyword: workers

  This bug was reported by the automated email system.

- config: backup sometimes will create arc files slightly over

  arc-size-limit.  If a user sets arc-size-limit to 4GB, it's very

  likely that at least one arc file will be slightly over 4GB.  If any

  arc file is over 4GB and cache-size-limit is set, the restore

  planner needs twice as much memory to create a plan.  To prevent

  this RAM doubling, HB will adjust arc-size-limit down slightly when

  arc-size-limit is 4GB so that arc files are never bigger than 4GB.

- rm/retain: when arc files are packed to remove empty space, rm may

  combine multiple archives into one larger archive.  Downloading

  multiple arc files may exceed cache-size-limit, causing rm to hang.

  Now rm will combine a smaller group of arc files that does fit into

  the cache rather than locking up.  Thanks Ben!

- config: pack-combine-min can be set to zero to disable combining

  small arc files into larger arc files.

#2103 Mar 29, 2018 - expires Jul 15, 2018



* decrease over-the-limit cache sooner rather than later


- if cache-size-limit was set to 10GB, the local backup directory

  contained 10GB of cached arc files, and the cache-size-limit was

  then lowered to 1GB, the next backup would leave the cache at 10GB

  for the duration of the backup and only trim it down to 1GB when the

  backup finished.  This was an intentional design decision: if the

  cache is already at 10GB, why not use it all and trim later?

  But this was confusing for users: if the local backup directory disk

  was nearly full (hence cache-size-limit was lowered), it's

  reasonable to expect HB to lower the cache size as soon as possible

  rather than wait until a completed backup.  The cache also was not

  trimmed down to the lower size if the backup was interrupted.

  Now, HB will lower the cache size when the next backup starts rather

  than when it finishes.  Of course, if there are arc files in the

  local backup directory that need to be sent to remotes, they will

  delay the cache trim.

#2102 Mar 26, 2018 - expires Jul 15, 2018



* remote-to-remote copy was ignoring cache-size-limit


- if cache-size-limit was set, a new destination was added, and the

  new destination's transfer rate was slower than the old

  destination's tranfer rate, backup (and sync) were not respecting

  the cache size limit.  Arc files were downloaded as fast as possible

  from the source destination, possibly filling the backup directory's

  disk.  Now cache-size-limit is respected during a remote-to-remote

  copy.  This bug was recently introduced when the cache handling was

  rewritten to allow cacheing arc files for mount.

#2100 Feb 21, 2018 - expires Jul 15, 2018



* bump expiration date

* backup: local cache not being trimmed

* selftest: fix exception if local arc file size is wrong

* backup: removed disk full test

* selftest: display missing arc file size

* S3: only read ~/boto.cfg to prevent GCE segfault

* B2: display more info if a file isn't found

* backup: fix rare bug "IndexError: No item with that key"


- backup: if cache-size-limit was set to 5GB then reduced to 2GB,

  backup failed to reduce the cache size to 2GB. Thanks Tadas!

- selftest: if the local copy of an arc file is the wrong length,

  selftest is supposed to print some diagnostics.  But a recent change

  added a length test at a lower level and caused this traceback,

  preventing selftest from executing:

    Traceback (most recent call last):

      File "/", line 191, in <module>

      File "/", line 469, in main

      File "/", line 134, in proginit

      File "/", line 314, in init

    Exception: /media/backup/arc.253.0 should be 103378368 bytes, is 101257216 bytes

  This check has been removed so that selftest can display:

    Checking arcs I

    Error: arc.253.0 size mismatch on local file: db says 103378368, is 101257216

    NOTE: arc.253.0 is correct size on <destination>

    1 errors

    Checked  arcs I

- backup: previously HB checked to see if there was room for at least

  2 arc files before opening a new arc file and raised an error if the

  disk was nearly full:

    Backup directory is nearly full, disk has only 177 MB available

  This causes problems when the backup disk is nearly full and the

  cache has been resized down, so the test was removed.  Now HB fails

  with a write error when the disk is full.

- selftest: if an arc file is missing, display its size to aid

  troubleshooting, for example:

  Error: arc.4.22 1121232 bytes is not local nor on any active destination

- S3: HB uses the boto library to access S3.  GCE (Google Compute

  Engine) VM environments have a special /etc/boto.cfg that loads

  Google-customized Python code into boto (and hence HB).  This fails

  in various ways, often causing a seg fault or hang.  Now, HB only

  loads boto.cfg from the user's home directory.  Normally it will not

  exist, but if you want to make changes to boto config settings, you

  can create a boto.cfg file in your home directory.  Thanks Jose!

- B2: when a file isn't found, display the B2 fileid, expected SHA1,

  and expected size to aid troubleshooting.  Thanks William!

- backup: if a path listed on the command line changed from a

  directory to a non-directory, it could cause this traceback:

    Traceback (most recent call last):

      File "/", line 109, in <module>

      File "/", line 3077, in main

      File "/", line 773, in addallpaths

      File "/", line 2152, in backupobj

      File "/", line 362, in addlog

    IndexError: No item with that key

  The backup is fine, just re-run it.  Thanks Jacob!

#2089 Jan 20, 2018 - expires Apr 15, 2018



* S3: minor verify fix if Dir both used and not used

* DAV: HB executable copy & verify problem

* dest verify only verifying 1 destination

* backup: traceback on maxtime/maxwait timeout

* fix db upgrade failure on newer versions of OSX


- S3: if a bucket contains "subdirectory" backups that use the Dir

  keyword and also has a backup at the bucket's root that does not use

  the Dir keyword, a dest verify on the root backup listed the entire

  bucket contents instead of just the root contents.  It still worked,

  just not as efficiently.

- DAV: on WebDAV destinations with copy-executable set to True (the

  default is False), the hb#xxxx executable was being copied as just

  "hb" because the pathname was not url quoted.  Ie, only the latest

  verson was stored on the remote.  Related bug: when verify checked

  the hb#xxxx files, it would say they were all verified when actually

  there was only one "hb" file on the remote.  Thanks Mehmet!

- dest verify: a bug was introduced in #2082: if more than one

  destination was configured only the last was being verified.

- backup: when --maxwait or --maxtime were used and the wait time

  limit was exceeded, this message was displayed:

      Warning: maxwait exceeded before all archives were copied

  but then a traceback would occur: AttributeError: 'module' object

  has no attribute 'close'.  The backup is fine, though one file is

  not sent to the remote and will be sent on the next backup or sync.

- with newer versions of OSX, a traceback could occur when upgrading

  the database of an older backup:

    Traceback (most recent call last):

      File "/", line 109, in <module>

      File "/", line 2595, in main

      File "/", line 286, in opendb

      File "/", line 556, in upgradedb

      File "/up29/", line 61, in upgrade

      File "/up29/", line 149, in opendb

      File "/up29/", line 409, in maxmem

    KeyError: 'pages_free'

#2084 Jan 9, 2018 - expires Apr 15, 2018



* S3: be more careful cleaning up multipart uploads

* S3: handle Dir / in dest.conf


- S3: HashBackup uses multipart upload to allow multiple workers

  threads to upload a single large file for better performance.  This

  creates temporary files on S3 during the upload and S3 deletes them

  when the upload finishes.  However, if the upload does not complete

  for some reason, these temporary files stay in your S3 account, you

  get charged for them, they don't ever go away, and they don't show

  up anywhere - not even in a listing of your S3 bucket.

  To keep your costs down, HashBackup cleans up these temporary files

  the next time backup runs.  But when multiple backups are stored in

  one S3 bucket using the Dir keyword and are run concurrently, a

  backup going to dir 1 could mistakenly abort an active multi-part

  upload to dir 2.  Now HB is more careful about cleaning up only

  multipart uploads belonging to the current backup.  Thanks Scott!

- S3: if Dir / was used in dest.conf, meaning to put files at the

  bucket's top-level as if there were no Dir keyword, it worked for

  backup but dest verify would fail: it said files didn't exist and

  re-uploaded all of the backup files.

#2082 Jan 7, 2018 - expires Apr 15, 2018



* ssh, rsync, ftp dest verify: 60x+ faster

* ssh: removing files from ssh destinations is 2x faster

* ssh, ftp: create directory path if it doesn't exist

* dest verify: don't verify uncommitted arc files

* clear: warn if destinations halted


- ssh, ftp, rsync dest verify: previously files were verified

  one-by-one with these destinations, but are now verified as a group.

  With 8 workers and 5500 arc files, ssh verify was taking about 3

  minutes over a local LAN connection, maxing out an 8-core box and

  causing a heavy load on the ssh server.  Now the same verify takes 3

  seconds.  Rsync and ftp destinations have similar performance

  improvements.  Backups on servers over "real" Internet connections

  will also take just a few seconds to verify now, whereas before,

  this same test with 5500 arc files could have easily taken 30


- ssh: removing files from ssh destinations is 2x faster

- ssh, ftp: previously the ssh and ftp destinations would only create

  a single subdirectory from the Dir keyword because of limitations in

  those tools' built-in mkdir command.  Now HB issues extra commands

  if necessary so that a full directory path can be created (a/b/c).

- dest verify: when a backup is interrupted, some arc files may have

  been sent to destinations but are not "committed", ie, they're not

  really part of the backup yet - this is normal.  The dest verify

  command was verifying these uncommitted arc files, then removing

  them all.  Nothing wrong with that, but it's very confusing.  Now

  the verify command doesn't verify uncommitted arc files.

- clear: if any destinations fail to start, the clear command warns

  that halted destinations won't be cleared.

#2077 Dec 31, 2017 - expires Apr 15, 2018



* Happy New Year!

* mount: prefetch enabled again, even better

* s3: add Paris region (eu-west-3)

* mount: ignore empty versions

* mount: bug fix for deleted files

* get: bug fix for deleted files

* rekey: don't accept -k ask or -k env

* fixed "Backup directory locked" during DB upgrade


* Happy New Year and good luck to everyone in 2018!  Send an email if

  you have a great HashBackup success story to post on the Customer

  web page.

* mount: prefetching is enabled again.  This only has an effect when

  cache-size-limit is set, ie, backup data has to be downloaded.

  On a small 512M single-CPU test VPS, S3 performance reading a 200MB

  file with dd using an HB mount (8 workers in dest.conf) increased

  from 2.1 MB/s to 35 MB/s - about 17x faster.  Google Storage is

  about 11x faster (3.8 MB/s to 41 MB/s), Backblaze B2 is 2-3x faster

  (4 MB/s to 10 MB/s).

  The mount prefetcher stays inactive on random reads to keep download

  costs low.

  The mount -B option can be used to increase the block size when

  downloading.  This usually increases performance but can have the

  potential negative effect of increasing download traffic and costs

  for random reads.  Be sure to test whether -B increases performance

  and/or costs for your usage.

  For B2 in particular, using -B4M increased mount sequential download

  performance from 10 MB/s (8 workers) to 22 MB/s (4 workers).  Using

  -B32M increased performance even further, to 40 MB/s (4 workers).

  B2 has higher latency (requests take longer to process), so making

  each request larger is more beneficial for B2 than the other storage

  services.  The downside of a large -B is it's likely wasteful, slow

  and expensive to download 40 MB on every random read.  Keep in mind

  that sequentially accessing a file deduped across many versions via

  an HB mount might actually cause a lot of random reads behind the

  scenes, so you don't want -B too high.

* S3: added Amazon's new Paris region, eu-west-3, to HashBackup

* mount: as retain removes old files from the backup, it's possible

  that a version becomes completely empty.  These empty versions

  should probably be deleted by automatically rm/retain.  In the

  meantime mount will ignore them instead of having a lot of empty

  top-level directories in the mount.

* mount: if a file existed in r2 but was deleted in r3, it was

  correctly missing from a r3 directory list.  But a cat command from

  r3 would display the r2 file instead of giving a "No such file"


* get: if a file was saved in r2 and deleted in r3, get -r2 gave an

  incorrect error that the path wasn't in r2, then showed that it was

  in r2, like this:

    Path is not in version 2: (pathname)

    Versions of this file:

      2017-12-18 19:16:12 in version 1

      2017-12-18 19:16:43 in version 2-2

      2017-12-18 19:17:30 in version 4

  The backup is fine, this was a get bug.  The error is correct for

  get -r3, except version 2-2 should just say 2, and now it does.

* rekey: -k ask and -k env now cause errors, because it's very likely

  that -p ask or -p env is what you really want

* a recent change could cause a "Backup directory is locked" error

  during db upgrades

#2060 Dec 17, 2017 - expires Apr 15, 2018



* mount: prefetch disabled

* rm/retain/selftest: optimize selective download for cost

* get/mount: --cache option

* mount: new --cachesize option

* cleanup & rename spans.<pid> temp directories


- mount: prefetching was added in the previous release but could hang

  in certain configuations so has been disabled for now.  Use the

  mount -B option to improve mount performance by reading larger


- selftest/rm/retain: for these commands selective download is

  optimized to lower cost rather than maximize performance

- get/mount: the --cache option can be used when cache-size-limit is

  >= 0 (backup data is not local) to specify a different directory to be

  used for holding downloaded backup data instead of the backup

  directory.  The --cache option cannot be used if cache-size-limit is

  -1 (the default) because no data is downloaded: it is referenced

  directly from the backup directory.

  Use --cache when:

  a) there is not enough room in the backup directory to hold a large

     cache.  For example, the backup directory might be on a small SSD

     but a large cache is needed for a large restore, or you want to

     maintain a very large mount cache to improve performance.

  b) you can't or don't want to lock the backup directory for a

     restore, either because a backup is running or you want

     concurrent restores

  The option is --cache cachedir[,lock] where cachedir is the

  directory for the cache and ,lock is optional.  If ,lock is omitted,

  the main backup directory is not locked.  If ,lock is added, the

  backup directory is locked.

  Adding ,lock allows arc files already in the backup directory to be

  used in the restore or mount without downloading again.  If ,lock is

  not used, existing arc files can be used only if the cache directory

  is on the same disk as the backup directory.  When the backup

  directory is not locked, arc files can disappear at any time and get

  & mount can't handle that.

  The cachedir directory is always locked.  For concurrent restores or

  mounts, specify a unique cache directory for each, maybe using $$ in

  the cache directory name ($$ is replaced by the process id).

  If cachedir already exists, it will remain after running HB.  If it

  doesn't exist, HB will create it then delete it when finished.

- mount: a new --cachesize option will expand the mount cache beyond

  the normal cache-size-limit.  Example: --cachesize 1.5G  The default

  cache size is the larger of cache-size-limit or 2 x (arc-size-limit

  + 10MB).

- the span.<pid> temporary directory created for get & mount when

  cache-size-limit is set is now span.tmp and is deleted whenever the

  backup directory is locked if it was not previously deleted.

#2055 Dec 9, 2017 - expires Apr 15, 2018



* mount: use readahead to improve performance

* backup: add -B8M and -B16M

* get: fix spurious error message after restore

* get: fix restore bug introduced in #2040


- mount: when reading files from an HB mount, backup data is

  downloaded in the background before it is needed (readahead) to

  improve performance 3-4x in some cases.  Keep in mind that restoring

  a large file with the get command is still 25% faster than copying

  it from an HB mount with dd, cp, or rsync, and restoring a directory

  of small files is 3x faster with get.  Mount cannot predict which

  files will be accessed in the future so readahead does not help much

  with many small files.

- backup: two larger block sizes -B8M and -B16M are available for

  special backup situations, like huge backups of huge files

- get: a race condition could randomly cause this error message just

  before exiting:

    Unhandled exception in thread started by 

    sys.excepthook is missing

    lost sys.stderr

- get: #2040 could fail with errors "arc.v.n not in restore plan"

#2040 Nov 30, 2017 - expires Apr 15, 2018



* mount: uses selective download for faster & cheaper access

* mount: new -B option to set download blocksize

* mount: honor cache-size-limit


- mount: selective download is now used for destinations that support

  it.  For more details about selective download, see #2035's

  changelog entry.  Readahead is not implemented yet so performance

  will be slower for large sequential access.  The mount command is

  easy and convenient, but for large restores when not all arc files

  are local (cache-size-limit is set), the get command will be faster.

- mount: a new option -B specifies the minimum download block size.

  Higher values will increase throughput but may also increase latency

  and download fees because extra data may be downloaded that is not

  needed.  The default is 64K, which tends to minimize download fees.

- mount: previously mount downloaded whole arc files without

  respecting the limit on the size of the local arc cache,

  cache-size-limit.  Now it will try to respect the limit though it

  will temporarily increase the limit if any single download is bigger

  than cache-size-limit.  When the cache is full, the oldest items are

  discarded.  If they are referenced again, they will be downloaded

  again, so it's important that cache-size-limit is set to a

  reasonably large value if you use mount a lot.

#2035 Nov 28, 2017 - expires Apr 15, 2018



* get: use selective download for faster & cheaper restores

* fix slow performance with many (100K) arc files

* get: -i pauses after plan is displayed

* S3: use HB's cacerts.crt file for SSL (secure http)

* get: fix shared block restore error

* selftest: bug fix for -v5 <pathname>

* backup: save btrfs snapshots

* log: fix LookupError: unknown encoding: string-escape

* OpenStack: fix DNS errors if authurl had a port


- get: selective download is now used during restores for destinations

  that support it: S3 & compatibles, B2, Rackspace Cloud Files / Open

  Stack, WebDAV, Dir, and FTP.  Previously when HB needed data from a

  remote arc file for a restore, it downloaded the entire arc file

  then picked out the data it needed.  With selective download HB

  only downloads the data it needs.  This saves local restore cache

  space, decreases restore time, and decreases download data and fees.

  Selective download is most noticeable on:

  - backups with a lot of versions

  - backups using large arc files

  - files / directories with a high change rate

  - files / directories with a lot of dedup between versions

  - restoring just a handful of files / directories

  IMPORTANT: expect an increase in the number of storage requests /

  operations during a restore.  This extra cost is more than offset by

  the decrease in cost because less data is downloaded.  The new

  restore will never cost more than the old restore.

  EXAMPLE 1: an active HB dev directory that is 2 years old, with 4600

  files (200MB), has daily changes saved in 315 versions using 1GB arc

  files on a USB2 drive on a 2010 Core 2 Duo system.  The backup is

  packed regularly, so many of the arc files are smaller than 1GB.

  "cache" is the local disk space needed to perform the restore.

  Restore with old version (#1983):

    download: 7.4 GB

    cache:    4.4 GB

    time:     3m 19s

  Restore with new version:

    download: 168 MB (44x less)

    cache:    109 MB (40x less)

    time:      16s   (12x faster)

  EXAMPLE 2: an active user directory with 55K files (16GB) from the

  same backup has daily changes saved in 560 different backups.

  Restore with old version:

    download:  25 GB

    cache:     10 GB

    time:     10m 32s

  Restore with new version:

    download:    12 GB  (52% less)

    cache:      3.9 GB  (61% less)

    time:       7m 32s  (28% faster)

  These examples are "downloading" from a USB2 drive at 31MB/s,

  similar to a 250MBit/s Internet connection.  Internet download rates

  are usually a lot slower, so selective download will make an even

  bigger difference with restores from cloud storage services.

  During restores, a temporary cache directory spans.<pid> is created

  in the backup directory.  HB deletes this after the restore.

  This release is compatible with #1977 or later so you can run your

  own restore comparisons between versions of HB.

- stress tests with 100K arc files revealed slow O(n^2) performance in

  several places:

   1. the "Checking arcs I" phase of selftest.  It was taking 4 hours

      and now takes 20 seconds - 720x faster.

   2. creating a restore plan when cache-size-limit is >= 0: similar

      performance improvement

   3. backup, rm, and retain when cache-size-limit is set and there

      are many arc files in the local cache.  In a test backup with

      20K local arc files and 100K remote arc files, a 2nd backup of

      one small file took 55 minutes and now takes 9 seconds.


   4. creating 100K small arc files by saving a 2GB file with a 4k

      block size and cache-size-limit set to 0 took over 2 1/2 hours

      with #1983 before being killed.  It took 11 minutes to finish

      with this release.

  O(n^2) means n items need n times n (n squared) operations.  With

  1000 arc files, 1M operations are needed.  It isn't horrible for

  small n, but for larger n it gets ridiculous.  100K arc files is

  100x more files, but required 10B operations - 10,000x slower.

- get: with -i, a Continue? question is asked after the restore plan

  is displayed but before any downloading starts.  This gives a chance

  to review cache sizes to ensure the restore will succeed.

- S3: always use the cacerts.crt file in the backup directory to

  validate SSL connections

- get: when cache-size-limit is set, a restore error on file A could

  cause a restore error on file B if they shared a block and file A

  was the block's first reference.  Bug found in internal testing with

  forced errors.

- selftest: if cache-size-limit is set, selftest -v5 <pathname>

  sometimes caused an error "list index out of range".  It continued

  after that, but arc files were not prefetched.  Also, instead of

  testing just the path specified, it tested the entire backup.

- backup: btrfs, a newish Linux filesystem, can create snapshots.  But

  all snapshots have a common inode number, so HB only backed up the

  main snapshot directory, not the directories containing data.

- log: if an error occurs while trying to start the hb command being

  logged, this error was displayed instead of the actual error:

    LookupError: unknown encoding: string-escape

  From built-in traceback exception reporter.

- OpenStack destinations require an authurl.  If the authurl included

  a port number, HashBackup was not handling it properly and generated

  an error:

    dest xxx: unable to start: DNS error getting IP address for [gaierror] [Errno -2] Name or

    service not known

#1982 Oct 11, 2017 - expires Jan 15, 2018



* improve hash function for very large dedup tables

* backup: expand & shrink dedup table

* backup: rebuild appropriate size dedup table

* audit: change fatal error back to warning, include traceback

* audit: don't cause 2nd traceback


- the dedup hash function was improved for dedup tables 10GB and up.

  There was a bias in the function causing lower positions of the hash

  table to be used more often than upper positions.  Everything still

  worked correctly but it could affect dedup performance.

- backup has always expanded the dedup table as it fills.  Now it

  resizes the dedup table both up and down within the limit set by -D

  or dedup-mem.

- backup: if the dedup table was missing, backup rebuilt it using a

  size just large enough for existing entries.  This could cause

  another resize in a short time.

- in #1846, it became a fatal error instead of a warning if a command

  could not be audited, but it was often hard to determine the reason

  the command couldn't be audited.  Now a traceback is printed.

  The auditing error is back to a warning now because if rekey is

  interrupted, it says "re-run rekey" but then won't let rekey run

  because of the auditing error.  Impossible loop to escape.  Thanks


- when auditing was enabled and an error occurred, it could cause a

  2nd traceback:

    Traceback (most recent call last):

      File "", line 255, in <module>

    cPickle.PicklingError: Can't pickle <type 'traceback'>: 

        attribute lookup __builtin__.traceback failed

#1977 Sep 29, 2017 - expires Jan 15, 2018



* database upgrade to dbrev 30

* readkey: public key/asymmetric encryption for write-only backups

* get: new --cache option

* init, export: ask twice for passphrase with -p ask

* clear: don't clear dest.conf from the database

* backup: fix traceback at end of backup on Ubuntu 16.04


- this release does an automatic database upgrade when any HB command

  is used. The upgrade prevents access by earlier HB releases since

  they do not recognize the encrypted backup keys created by the new

  readkey command (below).  Most commands failed harmlessly and had

  the nice benefit of showing that backup keys actually were

  encrypted.  But running an older version of selftest -v4 --fix on a

  readkey backup would have deleted every file because no data blocks

  would be readable.

- readkey: this new command enables RSA public key encryption, also

  called asymmetric encryption, to provide write-only backups.  When

  readkey is enabled, a new file readkey.conf is created in the backup

  directory.  When readkey.conf is present, HashBackup acts as before,

  with no extra security, but does display this warning:


WARNING: readkey is enabled and readkey.conf is present


  When readkey.conf is NOT present:

  - selftest is limited to -v2 since arc files cannot be read

  - rm & retain can delete unneeded arc files but cannot pack them

  - mount fails with an I/O error if user data files are read

  - get fails immediately to avoid creating 0-length .hberror files

  - all other commands work normally, including backup with dedup

  - backup performance is the same

  - restore may be slightly slower when accessing a lot of versions


  Readkey can be turned on or off at any time.  A typical setup would

  be to enable readkey, copy key.conf, readkey.conf, and dest.conf to

  secure locations, then delete readkey.conf.  Automated backups will

  still work but no data can be restored until readkey.conf is copied

  to the backup directory.

  The default RSA key length, 2048 bits, is recommended by the NIST

  through the year 2030.  The -b option can set a key length from 1024

  to 4096 bits, but keep in mind longer keys make restores slower.

  For example, it takes .05 seconds to decrypt a key with RSA-2048 and

  .3 seconds (6x longer) with RSA-4096.  This happens once per version

  during restores.  On backups with many versions such as a 1-year old

  hourly backup (8700 versions), restores could take 45 minutes longer

  with 4096-bit keys but only 7 minutes longer with 2048-bit keys.

  The -p ask/env option adds a passphrase to the readkey.  For -p env,

  the readkey passphrase environment variable is HBREADPASS.  

  The readkey command with no on/off command displays the current

  readkey status and if readkey.conf is present, verifies the key.

  See the web site for more details.  Thanks Tobias!


  - a new readconf.key is generated every time readkey is enabled so

    it's important to copy the new readkey.conf to secure locations

  - readkey.conf is not deleted when readkey is disabled in case there

    is a copy of the backup somewhere with readkey enabled

  - readkey does not prevent deleting backups since this requires

    cooperation from remote storage servers.  Use hb config options

    admin-password, enable-commands and disable-commands, and hb dest

    load to mitigate this risk.

  - backups created with HB#655 or lower (before Sep 10, 2012) do not

    gain extra protection because these early versions used convergent

    encryption keys rather than random session keys to encrypt data.

- get: when cache-size-limit is set, the get command has to download

  arc files to do a restore.  A large restore might require more disk

  space than is available in the backup directory.  For example, the

  backup directory might be on a small SSD with cache-size-limit set

  to 10G, but the backup itself could be many terabytes, and restoring

  it could require downloading hundreds of GB of arc files that won't

  fit in the backup directory.

  A new get option --cache specifies another directory to use as temp

  space for doing the restore, and can be on a different disk with

  more space than the backup directory.  If the directory doesn't

  exist, it is created and will be deleted after the restore.  If it

  already exists and contains files from the same backup, they will be

  used again without needing another download, so create the directory

  beforehand if you want to keep the cache.  Thanks Jacob!

- init, export: -p ask only asked for the passphrase once, which could

  lead to problems if there was a typo.  Now it asks twice and

  verifies they match.  Thanks Tobias!

- clear: if dest.conf is loaded into the database it survives clear

- backup: after the last file has been saved (inex.conf), an error

  could occur when trying to cd back to the starting directory on

  Ubuntu 16.04:

    OSError: [Errno 2] No such file or directory: '(unreachable)/

  The cd was removed since it isn't necessary.  Thanks Auke!

#1963 Sep 23, 2017 - expires Jan 15, 2018



* backup: recognizes filesystem restores & migration

* S3: new "secure" keyword to enable SSL

* ls: use . to list current directory

* get: accept ./filename or filename

* rm: accept ./filename

* log: can't log the clear command

* log: fix hang when question asked


- backup: previously when a filesystem was copied, eg to a larger

  disk, SSD, or new system, HashBackup did a good job of not creating

  new backup data if dedup was enabled, but the database size could

  increase quite a bit.  Now this is handled more efficiently if

  pathnames stay the same and file attributes are copied, for example:

  - filesystem is copied with cp -rp

  - filesystem is copied with rsync -a

  - filesystem is copied with a cloning utility

  - filesystem or directory is restored with HashBackup

  Thanks Daniele!

- S3: HB's S3 interface has not used SSL because the S3 protocol is

  resistant to attacks and the data is encrypted.  But some customers

  want to use SSL anyway, so the "secure" keyword has been added as an

  option to S3 destinations.  It takes a true/false value or if used

  without a value it enables SSL.  Thanks Alex!

- ls: handles leading . in the pathname to list:

    hb ls .      list the current directory, like ls `pwd`

    hb ls ./     same as ls .

    hb ls './*'  list the current directory recursively (note quotes!)

    hb ls ./xyz  list `pwd`/xyz

    hb ls xyz    list any pathname with xyz as component

  Thanks Kriston!

- get: using ./filename or a simple filename adds the current

  directory to construct a full pathname, ie, `pwd`/filename

- rm: using ./filename means `pwd`/filename, like get.  A simple

  filename is not supported for rm because removing files from the

  backup is not reversible so there is more danger of a mistake.

- log: a traceback occurred when used with clear because clear deletes

  the logs.  Now it's an error to use hb log clear.  From email


- log: the log command is designed for use with cron jobs but can be

  used when typing HB commands on the keyboard.  With a keyboard, if a

  question was asked, eg when clear asks "Are you sure?", the question

  was never displayed and HB seemed hung.  This works now, though in

  the log file, the question is combined with the next line of output

  and the response isn't logged.  Not perfect, but better.

#1955 Sep 6, 2017 - expires Jan 15, 2018



* backup: a debug print was left on in #1954, and commits were

  happening too frequently, slowing down backups.  Sorry about that!

#1954 Sep 6, 2017 - expires Jan 15, 2018



* database upgrade to dbrev 29

* backup, stats: display correct statistics with interrupted backups

* dest clear: cleared destination containing only copy

* delete zero-length arc files in local backup directory


- this release does an automatic database upgrade when any HB command

  is used, to correct statistics for incomplete backups.  The upgrade

  may take a few minutes, depending on the number of incomplete

  backups and the number of files they contain.

- backup, stats: if a backup was interrupted, it was not included in

  some statistics, but was included in others.  This could lead to

  confusing and misleading numbers from hb stats.  Thanks Jacob!

- dest clear is supposed to refuse to clear a destination if it

  contains the only copy of a file.  But it was clearing it anyway

  because of a subtle bug that's now fixed.  Thanks Jacob!

- if a destination is supposed to have a file but it "goes missing",

  ie, is deleted outside HB, then when HB tries to fetch the file it

  creates a zero-length file in the backup directory.  But a

  subsequent run of HB got confused by this zero-length file.  Now it

  is deleted before it causes confusion.  Thanks Jacob!

#1951 Sep 2, 2017 - expires Jan 15, 2018


* error recovery / retry changes
* upgrade: --force required for non-interactive upgrade
* S3: was ignoring the port keyword
* S3: new dest.conf keyword "subdomain"


* the retry keyword in dest.conf controls how often HB retries errors.
  It is up to 3 integers: # of retries, initial delay, delay factor.
  This is called "exponential backoff", where each retry waits longer
  than the previous attempt.  The defaults were 2,5,2 so:

  - try the first time
  - wait 5 seconds
  - first retry
  - wait 10 seconds
  - 2nd and final retry (15 seconds total)

  Most customers want their backup to "just work", and since backup
  is usually run at night, would prefer it ran a little longer when a
  storage service is having problems rather than bomb out after only
  15 seconds of downtime / retries.

  The second problem is that exponential backoff works fine with small
  numbers, but causes ridiculous delays fairly soon.  For example,
  after 14 retries, the retry delay would be 1 day, then 2 days, etc.

  There are two changes to fix this:

  1. the retry default has been changed to 8,5,2 so there are 8
     retries in about 20 minutes.

  2. retry never delays more than 20 minutes, so if you use retry 10
     in dest.conf, the first 8 retries will take 20 minutes, then the
     next 2 will each delay 20 minutes.  The effect is that after the
     8th retry, HB will retry every 20 minutes.

  These changes make it easier to figure out the total retry delay.
  If you want to keep retrying for 5 hours, it would be:

  - 10 retries for the first hour
  - 3 retries for every hour after that
  - so retry 22 would be used in dest.conf

* upgrade: before downloading and installing a new version, upgrade
  asks for confirmation.  When run from a script or cron job where
  there is no keyboard, upgrade cannot get confirmation and halted
  with an error and traceback.  Now it requires --force in this
  situation to avoid the traceback.

* S3: the port keyword on an S3 destination was not being used, so
  attempts to use an S3-compatible service like Minio running on other
  than port 80 would result in an error:

  unable to start: [gaierror] [Errno 8] nodename nor servname provided, or not known

* S3: Amazon S3 uses bucket names as part of the host name, but
  S3-compatibles often don't support that since it requires DNS
  support.  A new true/false keyword "subdomain" can be added to
  control whether bucket names are added to the host name (subdomain
  true, the default) or the bucket name is added to the URL (subdomain
  false).  Thanks Alex!

1946 Aug 28, 2017 - expires Jan 15, 2018


* expiration date bumped to January 15, 2018
* update & document the log command
* backup: now has a SIGTERM handler
* new config options pack-combine-min/max


* the log command has been updated and documented.  This command is
  used as a prefix to other commands in cron jobs to timestamp and log
  HB output.  For example: hb log backup -c backupdir ...  It also
  will summarize and archive log files.  See the Log web page under
  Commands for details.

* backup: if kill -TERM (or kill -15) is used to kill a backup, it
  will finish the current file and then end normally.  It makes a
  checkpoint where it stopped and if --maxtime is used on the next
  backup with the same pathnames, it will restart where it left off.

* small arc files are combined into larger arc files by the retain and
  rm commands.  The settings were fixed at 1MB and 5MB, meaning arc
  files under 1MB were combined until they were at least 5MB.  These
  two settings are now config options, pack-combine-min (default 1MB)
  and pack-combine-max (default 5MB).  For the more daring, these can
  be used to repack an entire backup into larger arc files.

#1940 Aug 23, 2017 - expires Oct 15, 2017


* destination info is on
* optimize hb.db.N transfers
* collapse // in pathnames
* dest.conf onfail ignore works better


* destination info is on under Destinations and has
  been removed from doc/dest.conf.examples

* if a destination is turned off or fails, the next backup would
  sometimes send a complete set of hb.db.N files to all destinations.
  Now this update is much smaller.  This is useful for some users who
  regularly turn off or have missing destinations, for example,
  rotating 2 USB backup drives, or backing up to USB during the day
  and enabling offsite backup only at night.  Thanks Jacob!

* Unix allows // in pathnames, treating it as /.  Now all HB commands
  do this too.  Thanks Lukasz!

* On a backup with multiple destinations, one of which was not
  available, a get command would fail with this error even though
  other destinations contained the file(s) needed:

    Unable to download archive arc.0.0: Exception('destinations halted',)

  Now when onfail ignore is used on destinations that may not be
  available, like external USB drives, the restore will complete using
  the other destinations.  Thanks Jacob!

#1926 Jul 5, 2017 - expires Oct 15, 2017


* expiration date bumped to October 15, 2017
* database upgrade to dbrev 28
* IMPORTANT: potential sync problem
* WebDAV: faster performance
* backup: handle command line directory errors
* get: easier to restore deleted files
* B2: set default lifecycle on bucket
* B2: improve use of B2's API to reduce costs
* selftest: fix "file not on destination" error with split files
* get: fix rare "getblock: hash mismatch blockid n in arc.v.n"
* rm: fix rare unused path bug
* rm: require --force to remove the latest backup
* rm: 5x faster removing a version
* rm: 20% faster removing a path
* rm: fail quickly when removing a path and not backup owner
* dest unload bug fix
* selftest: better file version checking


- this release does an automatic database upgrade when any HB command
  is used:

  - improves split file handling (dest.conf maxsize keyword)
  - corrects a small problem in the May 5th database upgrade
  - removes unused paths caused by an rm bug that's now fixed

  The upgrade may take a few minutes, depending on the number of
  files in the backup.

-      ***************  IMPORTANT  *******************

  If you had a remote destination, added a new destination after April
  15th, and have a limited local cache (cache-size-limit >= 0), please
  read the following.  Otherwise, it does not apply to you.

  Selective download is a feature of B2, S3 & compatibles, FTP,
  WebDAV, Cloudfile, and Dir destinations since April 15th.  Internal
  testing has found an arc file corruption problem that can occur
  during destination to destination copies.  This is when you have
  have destinations, don't have a local copy of the backup, add a new
  destination, and HB must copy data from an old destination to the
  new destination to populate it.  Here are the 5 necessary conditions
  for this problem to occur:

  0. new destination was added after April 15th

  1. cache-size-limit is >= 0, ie, not all arc files are local

  2. the source destination (copying from this) supports selective
     download (one of the destination types listed above).  The target
     destination (copying to this) can be any type.

  3. the sync occurred after April 15th

  4. the source destination had more than 1 worker (default is 2)

  5. during the sync, the source destination failed.  An interruption
     like Ctrl-C, killing HB, or a system crash is okay - no problem.
     It has to be a failure where the destination stops HB displays an
     error message saying the destination stopped after several error

  If all these conditions are true, it's possible for a corrupt arc
  file to be created in the local backup directory when the source
  destination fails.  A serious side-effect is that the corrupt local
  arc file could be copied to the new target destination during
  destination to destination synchronization.  This means the new
  destination has a bad copy of the arc file.  The dest verify command
  cannot detect this because the file is there, it's the right size,
  and the checksum matches because the arc file was actually corrupted
  when downloaded from the source destination.

  If you have done a destination to destination sync with HB after
  April 15th and it satisfies all 5 of the conditions above, you
  should take one of these actions:

  - Option 1: run the new hb command checksync:

      hb checksync -c backupdir

    This will analyze your backup and try to determine if any arc
    files may have been affected by this sync bug.  This is the
    fastest option.  Checksync can be interrupted without harming the
    backup.  Checksync may recommend that you run selftest if it finds
    any bad arc files.  If you run checksync more than once, it will
    always report that the files need checking, even if you run
    selftest and everything is fine; all it is doing is selecting
    files based on timestamps.  The checksync command will be removed
    in 3 months since it is designed for this particular problem.

  - Option 2: if you still have both the source and target
    destinations configured, you can redo the copy operation by
    clearing the target destination with:

        hb dest -c backupdir clear <target dest name>

    The next backup will do another destination to destination sync.
    This is cheaper than running a full selftest -v4 because the data
    only has to be downloaded from the source destination, whereas
    with selftest -v4, data is downloaded from both the source and
    target destinations (and any other active destinations).

  - Option 3: the slowest but most thorough option is to run selftest
    -v4 --fix to verify your destination data is correct.  This will
    download all data from all destinations and verify every block in
    the backup.  For huge backups, you can use --sample (without
    --fix) to do a rough check of all arc files, but since not all
    data is checked there is some risk of undetected corruption on
    the new (target) sync destination.

  Examples sync scenarios:

  1. Had a complete local backup (cache-size-limit is -1, the
  default), added a new destination, HB copied all the backup files to
  the new destination.  This is fine because it is not a destination
  to destination sync (cond #1).

  2. Had a cache-size-limit set (not all backup files are in the
  backup directory), had an SSH destination, and added an S3
  destination.  Files were synced from the SSH to S3 destination.
  This is fine because the source destination (SSH) does not support
  selective download (cond #2)

  3. Had cache-size-limit set, had an S3 destination configured, added
  a B2 destination and did the sync in March.  This is fine because
  the sync occurred before Apr 15th & selective download was not
  available before then. (cond #3)

  4. Same as previous, but the inital sync to the new destination
  occurred after Apr 15th. This could potentially have caused a
  corrupted arc file on the target destination, but only if:

  - more than 1 worker was configured on the source destination; this
    is likely (cond #4)

  - during the sync, the source destination failed and stopped; this
    is not as likely (cond #5)

  If you have questions or need advice about whether your backup might
  be affected, please send an email.

- WebDAV operations are faster and more efficient:

  on 4shared:
  - old: upload 310 small arc files with 2 workers = 92s
  - new: upload 310 small arc files with 2 workers = 49s (47% faster)

  on OpenDrive:
  - old: upload 1000 small arc files with 10 workers = 4m 8s
  - new: upload 1000 small arc files with 10 workers = 3m 11s (23% faster)

  on OpenDrive:
  - old: delete 1000 small arc files with 10 workers = 1m 39s
  - new: delete 1000 small arc files with 10 workers = 49s (50% faster)

- backup: if an error occurred either reading or opening a directory
  listed on the command line, backup would stop abruptly with an
  unhelpful traceback that didn't include the directory name.  Now it
  displays an error message with the directory name, skips backing up
  that directory, and when finished, exits with an error code.

- get: when restoring a deleted file without -r, ie, restore the
  latest version, get would say:

    Most recent backup version: 2104
    Restoring from current version
    Path deleted in version 2097, last saved in version 2079: pathname

  Another get with -r2079 was needed to restore the file.  Now get
  gives the option to restore the latest saved version:

    Most recent backup version: 2104
    Restoring from current version
    Path is not in current version; last saved in version 2079, deleted in version 2097: pathname
    Restore pathname from version 2079?

  When restoring a deleted file with -r but the wrong version is used,
  get now lists all versions of the file.  This is too confusing for a
  yes/no answer, so another get with -r is needed to restore the file:

    Restoring from version: 1809
    Path is not in version 1809: pathname
    Versions of this file:
      2016-07-04 09:02:36 in version 1781-1808
      2016-08-09 18:10:57 in version 1810
      2016-10-05 12:51:06 in version 1867-1904
      2017-04-11 15:06:36 in version 2054
      2017-05-07 16:42:51 in version 2079-2096
      2017-05-07 16:42:51 in version 2105
    Path not restored: pathname

- B2: HashBackup does not need B2's automatic file versioning and has
  always used B2 API calls to delete previous versions immediately.
  Now when HB creates a new bucket, it sets the lifecycle to "keep 1
  version", ie, disable versioning.  It will also set this lifecycle
  if an existing bucket has the default lifecycle of "keep all
  versions".  HashBackup still deletes previous versions immediately
  because it's usually cheaper than paying storage costs until B2's
  lifecycle rules delete old versions.

- B2: when maxsize is used in dest.conf, files > maxsize are split by
  HashBackup before uploading.  Since the default for maxsize is 5GB,
  splitting doesn't happen often, but when it does, these split files
  caused one extra B2 API call for every upload and two extra calls
  for every download or remove.

- selftest -v4 on a destination that supports selective download (B2,
  S3, WebDAV, FTP, Dir, CF) would fail with an error "file not on
  destination: arc.v.n" if the file was split on upload because it
  exceeded the destination's maxsize.  The file is fine: this was a
  selftest download bug.

- get: this set of conditions:
  - backup with a cache size limit >= 0
  - not all arc files are present locally
  - change cache-size-limit to -1
  - try to restore a file with the get command
  - the arc file needed is not present locally
  - the arc file was split because of maxsize
  - the arc file has deleted data ("holes")
  - destination supports selective download
  - more than 1 destination worker
  could cause this error:
    getblock: hash mismatch blockid N in arc.v.n

  The backup is fine, but selective download doesn't work with split
  files and caused this misleading error message.  It seems unlikely
  to occur, yet an email traceback did come in with this error and it
  was reproduced.  It's fixed now.

- rm: in unusual cases, rm could leave an unused path in the backup.
  It didn't hurt anything other than causing a selftest warning.

- rm: when -r is used to remove an entire version, it has two
  behaviors depending on --force:

  -- with --force, rm deletes all files in the version

  -- without --force, rm deletes files in the version if there is a
     newer copy in the next backup version.  If there isn't, files are
     moved into the next version.

  However, if the most recent version was removed with -r, rm would
  delete all files as if --force was used.  Now it gives an error and
  requires --force.

- rm: when removing a version without --force, rm has to check every
  user file to see if it has been superceeded.  This check is now 5x

- rm: removing a path is 20% faster

- rm: when removing a path, the user running rm must own all of the
  backups containing the path.  Previously rm would start deleting
  paths, detect an ownership error, then revert the deletes and fail.
  Now rm verifies ownership first so it can fail quickly.

- dest unload was writing dest.conf as it should but was not removing
  dest.conf from the database.  This bug was introduced Apr 29.

- selftest: the database upgrade on May 5th had a small bug related to
  file versioning that selftest didn't notice.  It didn't affect the
  backup or cause data loss, but now selftest will catch this problem.

#1900 May 28, 2017 - expires Jul 15, 2017


* ls: traceback on ctrl-c
* selftest: add new error correction
* clear: could lose config on ctrl-c
* clear: destination mismatch on ctrl-c
* dest sync: performance fix


* ls: ctrl-c usually caused a traceback with the message "unable to

* selftest: the error "logid X removed in version V but saved in that
  version as logid Y" can now be corrected with --fix.  No backup data
  is lost.

* clear: ctrl-c at the right time could reset the config, even without
  the --reset option

* clear: ctrl-c at the right time could cause "destination mismatch"
  errors on the next command

* dest sync: a recent change could cause remote-to-remote
  synchronization to be slower when the source is a B2, S3 or
  compatible, WebDAV, FTP, Rackspace Cloud Files, or Dir destination
  that supports selective downloads.

#1895 May 9, 2017 - expires Jul 15, 2017


* database upgrade to dbrev 27
* backup: was creating "multiple C records" selftest error
* selftest: more accurate progress percentages in "Checking files"
* selftest: fix traceback on empty backup
* selftest: fix "sequence errors" introduced in #1890 a few days ago


- this release does an automatic database upgrade when any HB command
  is used, to correct "multiple C records" problem introduced in #1890

- backup: when a hard-linked file was previously backed up and then
  changed, it could cause a selftest error "multiple C records".  This
  bug was introduced in #1886 in the May 3rd release.  The database
  upgrade in this release corrects the problem.  From internal testing.

- selftest: in the "Checking files" section, the progress percentages
  were off, sometimes by a lot, especially on older backups where a
  lot of files had been deleted over time.  From internal testing.

- selftest: if all files were removed from a backup with hb rm /,
  selftest would fail with a traceback: 
    UnboundLocalError: local variable 'lastblockid' referenced before assignment
  From internal testing.

- selftest: a bug introduced a few days ago (the 30% speedup) could
  cause sequence errors on sparse files:

    Checking refs I
    Error: logid 49 seq 1025 blockid 30 expected seq -1023
    1 files have errors

  There was nothing wrong with the backup.

#1891 May 5, 2017 - expires Jul 15, 2017


* backup: fix "Database error: no such column: paths.pathid"

#1890 May 3, 2017 - expires Jul 15, 2017


* database upgrade to dbrev 26
* "orphaned files" bug corrected
* selftest: 30% faster
* get: fix traceback on --no-ino backup when cache-size-limit set


- this release does an automatic database upgrade when any HB command
  is used, to correct "orphaned files". (See next bug)

- in an unusual situation, files could become "orphaned" and never
  removed by retain.  There were around 1800 files like this in the HB
  build server backup containing 2.2M files, so less than 0.1%.  This
  bug didn't affect HB operations, other than keeping some very old
  files in the backup that retain should have removed long ago.  The
  bug causing this has been fixed and retain will now be able to
  remove them, so you might see more deleted files than usual in the
  first retain with this release.  From internal testing.

- selftest is over 30% faster with backups that have a lot of block
  references, for example, VM image backups with a 4K block size or
  any backup of large files.  The HashBackup build server backup has
  137M block references and selftest now runs in 7.5 minutes vs 11
  minutes previously.

- get: if cache-size-limit is set and a backup was created with
  --no-ino, restoring could cause a traceback.  Thanks Jacob!

    Traceback (most recent call last):
    File "/", line 142, in <module>
    File "/", line 1172, in main
    File "/", line 888, in plan
    File "/", line 948, in prefetch
    UnboundLocalError: local variable 'hlkey' referenced before assignment

#1883 Apr 29, 2017 - expires Jul 15, 2017


* config: don't display admin-passphrase hash code
* fix typo in CHANGELOG: no-backup-ext -> no-backup-tag
* selftest: --sample was only working with --inc
* selftest: bug fix "has data blocks but null file hash"


- config: the admin-passphrase is a hash code and so not normally
  displayed with the config command.  But if it was listed on the
  command line, the binary characters displayed could mess up the
  terminal settings.  Now it displays (hidden).

- selftest: a new test was giving an incorrect error "logid x has data
  blocks but null file hash" on partially backed up files

#1880 Apr 29, 2017 - expires Jul 15, 2017


* database upgrade to dbrev 25
* no selective download for older arc file formats
* config: enhance admin-password security
* dest verify: 3-20x faster on B2/S3/GS and 100x cheaper
* selftest: new --sample option for random arc testing
* selftest: handle weird options combinations better
* mount: fix traceback when downloading
* backup: fix Mem: display at end of backup (Linux, BSD)
* backup: unstable inodes warning is now a fatal error
* backup: better handling of --no-ino for get command
* backup: unusual no-backup-tag problem fixed
* selftest: fix rare "hash should be null" error


- this release does an automatic database upgrade when any HB command
  is used: directories use a few bytes less storage.  The update makes
  three passes over the database, so the update could take a while for
  very large backups.

- HashBackup has had four arc file formats since its initial release
  in 2009 and is still compatible with all of these.  But the new
  selective download feature (downloading parts of an arc file) only
  works with the latest arc file format, so a test was added to make
  sure HB doesn't try to do selective downloads on old-style arc files
  created before 2013; it downloads the whole file instead.

- the admin-passphrase config option can be used to restrict access to
  certain commands, including the config command, from users that have
  access to the backup directory.  To increase local security, it is
  now stretched with pbkdf2 and a random salt.

- the dest verify command quickly checks remote destinations to ensure
  all files HashBackup has sent are actually there, are the right
  size, and have the correct file hash - without downloading any file
  data.  Now, for B2, S3, and S3 compatibles, dest verify is 3-20x
  faster and up to 100x cheaper for large backups.  It verifies
  500-1000 arc / hb.db.N files per second at half a cent per 1M files.

- selftest has a new option, --sample N, indicating N blocks should be
  tested in arc files instead of testing every block.  --sample is
  used with -v3 or -v4 arc file testing and gives a higher degree of
  confidence than the "dest verify" command that arc files are
  correct, without having to download entire arc files.  --sample can
  only be used with destinations that support selective downloading
  since it needs to download the individual blocks being tested.

  --sample is very useful with large backups to cut testing time and
  download fees.  It can be used with or without --inc.  When used
  without --inc, --sample tests N samples from all arc files.  With
  --inc, --sample does sampled testing over a period of time and
  respects the --inc download limit.  Since it is only sampling, more
  arc files can be tested than before with the same download limit.

  --sample 1 will test 1 random block from each arc file

  --sample 2 or higher will test N-1 random blocks plus the last
    block in the arc file; this is to catch truncated arc files

  selftest may test fewer than N blocks in very small arc files.

  Any combination of -v3/4, --sample, -r, and --inc options can be
  used and they are tracked separately.  So for example, you can use
  --inc 1d/30d --sample 3 -v4 after every backup to do a random sample
  of 3 blocks from every arc file every 30 days, and on the weekend
  use --inc 1w/3m -v4 to do a full verify of every arc file every 3
  months.  Sampling is not implemented for -v5.

- selftest: specific arc files can be tested by adding arc filenames,
  arc.5.35 for example, to the command line.  But if -v4 was used with
  a list of arc filenames, selftest tested everything; you had to
  leave off -v4 to test specific arc files.  Now, only the arc files
  listed are tested.  This is useful if there is one bad arc file that
  needs to be repaired using copies on other destinations, but the
  backup is too large to download everything with a full -v4 selftest.

- backup: at the end of the backup, a Mem: line shows the maximum
  memory (RAM) used during the backup.  On OSX it was correct, but on
  Linux and BSD it was 1024x too small, ie, showing 135KB meant 135MB.

- mount: fixed a traceback found in testing:
    Getting arc.0.0
    TypeError('splitspans() takes exactly 2 arguments (1 given)

- backup: some filesystems like CIFS (SMB, Samba), FUSE (sshfs,
  UnRAID) do not have stable inode numbers like a normal Unix
  filesystem.  Inode numbers are used by HashBackup to detect hard
  links.  HB previously issued an unstable inode warning on
  incremental backups.  This is now a fatal error, advising to clear
  the backup and use --no-ino.  This prevents accumulating a lot of
  backup history with random inode numbers.  If inode numbers are
  unstable and --no-ino is not used, it can cause incorrect hard
  linking at restore time.  Thanks Steve!

  IMPORTANT: if you have a backup containing unstable inode numbers,
  --no-ino must be used with "get" to prevent incorrect hard linking.

- backup: when --no-ino is used, backup saves data so that restores
  are correct even if --no-ino is not used with get.  

  IMPORTANT: this only works for future backup data.  If you already
  have a backup of unstable inodes created without --no-ino, you can
  either clear it and start over, or use the --no-ino option on every
  future backup and restore.

- backup: no-backup-tag is a list of filenames that, if present in a
  directory, indicate the directory should not be backed up.  If
  directory p1 was tagged as "don't backup" and path p1/p2 was given
  on the command line, path p1/p2 was saved on every backup whether it
  was changed or not and caused selftest errors:

    Error: logid 78670297 removed in version 1008 but saved in that
    version as logid 78686742: pathname [r1007]

  If you have this problem in your backup, use hb rm -rN pathname to
  remove the pathnames in error.  Whew - this backup has saved over
  78M files.  Thanks Daniele!

- selftest: if backup notices that a file changes while it is being
  saved, it sets a "partial" flag on the file to indicate that the
  file wasn't completely saved.  Depending on the backup timing, this
  race condition could cause selftest to show a bogus error "hash
  should be null".  Thanks Daniele!

#1858 Apr 15, 2017 - expires Jul 15, 2017


* security document review
* parallel multipart download for single files
* dir destinations support selective download
* get: add workaround --no-ino option
* default network timeouts changed
* S3: don't retrieve files stored in Glacier
* S3: remove destination type euca


- the HashBackup security document was reviewed and a few changes were
  made for clarity - nothing major.

- the selftest -v4 and mount commands may need to download arc files.
  To improve download speeds, these commands can now utilize multiple
  workers when downloading one arc file from a destination.  For
  example, a 100MB arc file may use 5 workers to simultaneously
  download 5 x 20MB sections.  This is only available on destinations
  supporting selective downloading: S3 and compatibles (Google
  Storage, DreamObjects, SoftLayer, etc), Backblaze B2, Rackspace
  Cloud Files, WebDAV, and FTP.  The get, recover, and selftest -v5
  commands might download files, but they already download multiple
  files in parallel.  Now all HashBackup downloads are parallelized.

- dir destination store backup data on mounted storage, which might be
  a local directory or a remote directory.  Dir destinations now
  support selective download.  This is mainly for testing since Dir
  destinations can also use the dest.conf symlink option to avoid
  copying remote files.

- get: the new --no-ino option is used when restoring files that were
  backed up with --no-ino.  This is a temporary workaround to correct
  a few restore problems with unstable inodes, eg, when backing up
  CIFS filesystems.  When this option is used, no hard links are

- the default network communication timeout is now 5 minutes instead
  of 30 minutes.  This timeout is for making connections and doing
  small data transfers, not for file transfers.  The timeout period
  can be changed if necessary with the timeout keyword in dest.conf

- the network timeout for S3 destinations was previously 30 seconds,
  but that's a little short and it is now 5 minutes.

- S3: when an S3 bucket is configured to automatically transition
  files from S3 to Glacier, HashBackup previously issued "restore"
  commands to S3, causing the Glacier files to be temporarily stored
  on S3 where HB can get to them.  This was leftover from when HB
  supported Glacier.  But there is no restore pacing, so these
  restores from Glacier to S3 could be quite expensive.  To prevent a
  surprise AWS bill, HB no longer restores files from Glacier to S3.

- S3: the euca destination type (Eucalyptus Systems "Walrus" S3 clone)
  has been removed.  S3-compatible destinations use type S3 with host
  and port keywords.

#1850 Apr 7, 2017 - expires Jul 15, 2017


* selective downloads reduce data transfer fees
* remove access check on backup directory
* handle log filename timestamp collisions
* B2 keepalive default is now true


- download fees for storage companies are typically 10-15x higher than
  storage fees, so it's important to minimize downloaded data.

  HashBackup downloads files for:
  - selftest -v4 or -v5
  - recover
  - get (restore)
  - rm/retain to "pack" arc files (remove deleted data)
  - mount

  HB previously downloaded entire files.  But arc files often contain
  deleted data ("holes") created by rm and retain, especially if
  remote packing is disabled.  Now HB skips deleted data during
  downloads, saving data transfer fees.  The destination types that
  support selective download are: Amazon S3 and compatibles (Google
  Storage, DreamHost, Softlayer, etc), Backblaze B2, Rackspace Cloud
  Files, WebDAV, and FTP.

  Selective downloading is not yet fully optimized in this release, so
  it might be faster or slower than whole-file downloads depending on
  how much deleted data is in an arc file and the speed of your
  network connection.

  Selective downloading will usually be cheaper than whole-file
  downloads and is never more expensive.  You will see higher request
  fees, but even lower data transfer fees and lower overall costs.

- previously, HB did a preliminary access check on the backup
  directory, to avoid having obscure error messages for insufficient
  permissions.  But commands that are usually read-only, sometimes are
  not read-only in certain situations, making it difficult to predict
  what kind of access to the backup directory will be needed.  Now HB
  doesn't do an access check before starting, so a command might fail
  if write access is needed but only read access is available.  Thanks

- the hb log command creates log file names with a timestamp.  If the
  same command is run twice in the same second, there was a filename
  collision and traceback.  Now if a collision occurs, HB delays and
  retries until a unique log file can be created.  Reported by
  traceback email.

- B2: the default for keepalive is true rather than false.  If
  dest.conf has keepalive false, that should be removed.  If removing
  it causes problems, please send an email.

#1846 Apr 1, 2017 - expires Jul 15, 2017


* if audit logging fails, don't execute command
* read-only commands don't require rwx access (usually)
* B2 bug fixes
* B2: don't load files into RAM when downloading


- if audit logging is enabled but fails for some reason, HB displayed
  a warning that the audit log failed but executed the command anyway.
  Now if audit logging fails, it's a fatal error and the command
  doesn't execute.

- some commands are mainly read-only: audit, get, ls, mount, versions;
  but HB was always checking for rwx access to the backup directory.
  Now it only checks for rx access for these commands.  However, they
  can still fail with a permission error if:

  * the previous command aborted unexpectedly and the database has to
    be rolled back

  * cache-size-limit is >= 0 and the command (get or mount) requires
    downloading arc files from a remote destination

  IMPORTANT WARNING: HashBackup does not do individual file permission
  checking, so anyone with read-only access to the backup directory
  has access to anything that was backed up, even if they don't have
  access to the same data in the live filesystem.

- B2: log files said the download rate was X MB/s, when actually it
  was showing KB/s, ie, the number was 1000x too big

- B2: during downloads, files were loaded into RAM.  This was not
  intended and could lead to significant RAM usage, especially with
  large arc files and/or many worker threads.

#1844 Mar 19, 2017 - expires Jul 15, 2017


* IMPORTANT NOTICE: removing obsolete euca support
* expiration date bumped to July 15, 2017
* database upgrade to dbrev 24
* up to 3x faster to create hb.db.N
* hb.db.N files removed from local backup directory
* hb.db.N space savings on remotes
* Google Storage: note on storage classes
* rekey and export are 40% faster, and some fixes
* recover: faster, Glacier code removed, other changes
* backup: 5-6% faster on small files
* backup: add Mem statistic to show peak RAM usage
* backup: fix exception traceback
* backup: fix slow backup of sparse "hole only" files
* better handling of cache-size-limit 0
* S3: update Amazon S3 region list, adding 4 regions
* S3: update boto to latest version
* S3: delete incomplete multipart uploads
* S3: IBM Softlayer S3-compatible support (documentation)
* get: add .hberror to partially restored files
* dest verify/sync: always transmit dest.db


- IMPORTANT NOTICE: Eucalyptus Systems had one of the first
  S3-compatible object stores, "Walrus", and HashBackup supported an
  S3 destination type of "euca".  This was necessary because when
  initially launched, Walrus used a weird pathname to access buckets
  and HB didn't support host and port keywords for s3 destinations.

  Now, a host and port keyword can be added to the s3 destination
  type, making the euca destination type obsolete.  The euca
  destination type will be removed very soon.  If that's a problem,
  please send an email.

- this release does an automatic database upgrade when any HB command
  is used.  Some procedures have changed and the upgrade prevents
  access by earlier HB releases.

- every HashBackup command that modifies the database has to create an
  hb.db.N file to send changes to the remotes.  This is up to 3x
  faster in this version.

- hb.db.N files are no longer kept in the local backup directory,
  saving local disk space.  Running this release will remove local
  hb.db.N files.

- hb.db.N files use less remote storage space and HB tries to avoid
  the "delete penalty" for these files on Amazon Infrequent Access.
  This will shorten recover time and lower storage and download costs.

  HB only manages the storage class when the class keyword is used in
  dest.conf.  If the class keyword is not used, all objects are stored
  in the bucket's default storage class.  This will not be optimal on
  Google Nearline / Coldline because of the delete penalty.

  The first backup with this release will send one or more hb.db.N
  files and then will delete all of the old ones; don't panic - this
  is expected.  If you notice any changes that are not beneficial,
  please send an email with details (ls -l backupdir/hb.db* and hb
  dest ls).

- the Google Storage service supports 4 storage classes:

     multi-regional  regional  nearline  coldline

  Unfortunately, HashBackup isn't able to dynamically set the storage
  class with Google Storage like it does with Amazon S3, maybe because
  HB is using the S3 interface.  Instead, the storage class is
  determined by the default storage class on the bucket.  The downside
  is that Nearline and Coldline have 30/90 day "delete penalties" that
  HashBackup can't avoid like it often can for Amazon S3 by managing
  the storage class on individual files.

  The delete penalty can be fairly severe with Google Coldline
  storage.  With Coldline, you get a 65% discount from regional
  prices, or 30% discount from Nearline, but all files are charged for
  at least 90 days of storage.  If a file is added one day and deleted
  the next, you're charged 90x more than the one day of storage.
  Therefore Coldline is not recommended for use with HashBackup unless
  you retain all files for a minimum of 90 days.  It can be very
  difficult to compare pricing with delete penalties, and the only way
  to know for sure is to run parallel backups to separate buckets for
  a month or so to see which is more cost effective for your backup
  data's access patterns.

- rekey (and export) is 40% faster to improve scalability

- rekey aborts if any destinations halt, to prevent a failed
  destination having files associated with the old key.

- rekey was not restoring the old key file on errors.

- recover: following a recover, the next backup would send a large db
  update that was unnecessary.

- recover: reconstructing the database is faster: around 35% faster
  for the HashBackup build server backup.

- recover: the Glacier download pacing code - half of the code in
  recover - was removed since Glacier is no longer supported.

- recover: add retry on dest.db download errors 

- recover: the -c option is now required.  If the backup directory
  doesn't exist, recover offers to create it.

- an empty key file caused a traceback but now displays an error

- recover warned about overwriting an existing database only when the
  -a option was used.  But recover always overwrites the database, so
  -this warning should always be issued.

- recover: display a progress message while downloading arc files

- backup is 5-6% faster when the backup is primarily small files

- backup: a new Mem: statistic shows peak memory used.  This includes
  the two largest RAM uses: the dedup table and database cache.

- backup: if a sparse file hole was greater than 2GB, HB could fail
  with a traceback: Exception in shaq_loop.  Thanks Mark!

- backup: if a sparse file was only a hole with no data, it was backed
  up normally and could take a long time instead of just a few
  seconds.  Thanks David!

- the config option cache-size-limit controls the amount of arc data
  kept in the local backup directory.  Setting it to zero means you
  don't want any arc data stored locally, but that can cause backup
  performance problems.  The cache size is now raise to at least 2 *
  arc-size-limit (2 arc files) while HashBackup is running, then
  trimmed lower if necessary when it's finished.  This makes a cache
  limit of zero work much better.

- Amazon S3: the region list was updated, adding us-east-2, eu-west-2,
  us-south-1, and us-northeast-2

- S3: the boto library used by HashBackup to access S3 was updated to
  the latest version

- S3: for large files, HB uses multipart uploads: instead of uploading
  a 100MB file with 1 worker, it might be uploaded by 4 workers in
  25MB sections.  But if a multipart upload is interrupted, the
  partial uploads hang around indefinitely on S3 unless you have a
  "lifecycle policy" for them, and you are billed every month for
  storage costs.  These files don't show up in the online S3
  Management Console.  Now HB deletes any incomplete uploads every
  time it starts.  There may be quite a few of these the first time.

- S3: the IBM / SoftLayer Standard Cross Region storage service has an
  S3-compatible API that works with HashBackup.  This is just a
  documentation change, not a code change.  Thanks Jonathan for
  testing and sharing your dest.conf info!

- get: if a file is only partially restored (disk full for example),
  the .hberror extension is added to the filename to make it obvious
  that the file is incomplete

- dest verify/sync: if dest.db was missing on a remote, it was not
  transferred.  Now it is always transmitted on a dest sync or verify.

#1781 Feb 15, 2017 - expires Apr 15, 2017


* dest.conf bug fix (keyword requires a value)
* rclone workarounds
* dest verify with rclone is 10-15x faster
* install new script!


- dest.conf: if a required keyword did not have a value, a traceback
  would occur (TypeError: not all arguments converted during string
  formatting) rather than the correct error message, "keyword requires
  a value: (keyword)".  Reported via email by HB traceback reporter.

- rclone has a few issues that require workarounds in HB and have been
  filed on GitHub.

- hb dest verify does one ls command now instead of one per remote
  file and is 10-15x faster.  The output for rclone is more detailed
  now, with separate errors for "file not found" and "file size is
  wrong".  Checksums are not supported on shell destinations.

- the updated script must be installed manually by copying
  it from doc/dest.conf.examples to where the run command in dest.conf
  needs it to be.

#1779 Feb 9, 2017 - expires Apr 15, 2017


* rclone destinations support "hb dest verify" command


- the hb "dest verify" command verifies the presence and size of files
  on a destination without downloading the file.  This is now
  supported by the rclone shell destination.  For this to work, the HB
  program has to be upgraded with hb upgrade *and* the
  script has to be updated manually.  The script is in
  doc/dest.conf.examples/ in the tar file on the HB website.

#1778 Feb 8, 2017 - expires Apr 15, 2017


* get: restore planning is 30% faster
* get: restores symlink modification time
* get: display progress while creating plan
* get: fix hang with multiple download workers
* get: fix hang after restore errors


- get: creating a restore plan, used when cache-size-limit set, is 30%

- get: reset symlink modification time to the value stored in the
  backup rather than the time it was restored.  Some older OSs cannot
  set symlink modification times; in that case, symlinks will have the
  time they were restored.  Thanks Roy!

- get: displays progress while creating a restore plan when
  cache-size-limit is set.  Thanks to Roy for the suggestion.

- get: when cache-size-limit is set and the restore size is greater
  than cache-size-limit, a race condition could cause the restore to
  hang if multiple download workers were active.  Thanks to Roy
  exporting his 3TB backup for testing!

- get: if errors occurred while restoring files, it could case a hang.
  Now it works, even with 50% injected random errors on a 3TB restore.

#1762 Jan 30, 2017 - expires Apr 15, 2017


* rclone: updated script
* get: re-enable multiple download threads


- the script wasn't working with the mount command.  It is
  located in doc/dest.conf.examples of the tar file on and has to be updated manually.

- get: previously, concurrent downloads were disabled to avoid
  splitting bandwidth resources across multiple files.  This works
  well for low-latency storage services, but is not so great for
  high-latency services where it takes a while to get a download
  started.  For now, concurrent downloads are re-enabled.  To do this
  right, HB needs to adjust downloads dynamically.

#1761 Jan 23, 2017 - expires Apr 15, 2017


* backup: ssh destination initialization fix
* rclone: updated script
* backup: compensate for getcwd() bug on Illumos in LX zone
* backup: fix a very rare race condition causing traceback


- backup: with some ssh servers, ssh destinations had trouble
  initializing, displaying 3 errors about DESTID and then disabling
  the destination.

- rclone: HashBackup can use rclone to communicate with storage
  services not directly supported by HB.  The script to
  enable this was updated to use the rclone copyto command.  This
  makes downloads more efficient because copyto doesn't do remote
  directory listings.  The script was also changed to do unconditional
  transfers because rclone sometimes thinks a remote file doesn't
  need updating when it actually does (same size file on a remote that
  doesn't support checksums, like Dropbox).  Because of this change,
  the --verify option had to be eliminated to avoid transfer loops
  related to "eventual consistency" on many remote storage services
  (files are not necessarily immediately available after an upload).

  The new script is in doc/dest.conf.examples and must be
  installed manually.  Rclone will be built-in to HashBackup in the
  next release to avoid having to do a manual script update: it will
  get updated with hb upgrade just like the rest of HB.

- backup: HashBackup will run in an LX zone on Illumos, a descendant
  of Solaris.  LX zones emulate Linux under Illumos.  When the dedup
  table is resized during a backup, getcwd() is called.  There is a
  bug in LX zones causing getcwd() to fail with "No such file or
  directory" and backup can't finish.  As a workaround, the getcwd()
  call was removed.

- backup: in very rare circumstances that happened to be triggered by
  the previous Illumos bug, a race condition combined with a nested
  exception could cause a traceback:

    Exception in thread shaq_loop: unsupported operand type(s) for +:
        'int' and 'NoneType'
    Exception in thread shaq_loop: an integer is required

  It took a week to figure out why this was happening... Ugh!

#1751 Dec 30, 2016 - expires Apr 15, 2017


* backup: sometimes created oversize arc files
* backup: multi-thread performance improved 10-15%
* b2: increase internal retries
* rare selftest bug fix


- backup: multi-threaded backup of a series of medium-sized files,
  especially if already compressed, was sometimes ignoring the signal
  to start a new arc file.  This could create arc files much larger
  than arc-size-limit, like 200MB-8GB instead of 100MB.  This bug
  started in August.  Now arc sizes should be much more controlled.

- backup: multi-threaded performance has improved 10-15% for some

- b2: the Backblaze B2 driver has a small internal retry loop in
  addition to the outer retry controlled by the retry dest.conf
  keyword.  The internal retry loop now tries 7 times instead of 3.

- selftest: in very rare cases, selftest could display an error:
     Error: block xxx arcdel yyy blen zzz
  This was a bug in selftest -- the backup is fine.  Thanks Evan!

#1747 Dec 23, 2016 - expires Apr 15, 2017


* B2: fix connection reset by peer, again
* B2: add Content-Length header on B2


- B2: fix "connection reset by peer" errors, for real this time.  The
  file size limit on B2 is 5,000,000,000 bytes, not 5GiB.  This error
  only occurs on large initial backups, like 6TB.

- B2 the Content-Length header is documented as required, so now
  HashBackup sends it (even though it seems to work fine without it).

#1742 Nov 30, 2016 - expires Apr 15, 2017


* bump expiration date for backup command to April 15, 2017
* cacerts.crt: fix problem with B2 on BSD


- b2: beginning with #1715 around Nov 7th, HashBackup could not
  connect to B2 from BSD systems because of this error:

    [SSLError] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)

  In the backup directory, cacerts.crt is the root certificate file
  HashBackup uses to verify SSL connections.  On Nov 7th it was
  updated from and this update broke SSL connections to B2,
  but only on BSD; Linux and MacOS worked fine.  Apparently a limit on
  the number of certificates was exceeded.  Thanks Mark for reporting
  the problem.

#1738 Nov 27, 2016 - expires Jan 15, 2017


* backup: 10-15% faster (disable compression verification)
* backup: fix traceback when all backup data was removed
* rm: don't keep backup file when compressing database
* backup: --maxtime accuracy improved
* backup: --maxtime sometimes didn't backup anything


- backup: when the new compression code was launched in August, all
  data was verified to decompress correctly before being written to
  the backup.  It has been nearly 4 months without an error traceback,
  so verification has been turned off for a 10-15% performance gain.

- backup: if all backup data was removed with rm, a traceback could
  occur on the next backup (from internal testing):
    TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

- rm: when a lot of data is removed from a backup, rm compresses the
  database.  This was rewritten a couple of months ago and as a
  precaution, a backup of the original database was saved as
  hb.db.orig.  This file can be fairly large and no problems have been
  reported, so in this release, the backup file is kept during
  compression and then deleted.

- backup: --maxtime sets a time limit for the backup.  It's not very
  accurate for technical reasons and tends to run over the limit.  In
  this release the overrun is less than in previous releases.

- backup: when inex.conf was added to every backup a few months ago,
  it broke the --maxtime restart feature in some cases.  One backup
  would work, the next would be empty, the next would work (but wasn't
  a true restart), etc.

#1735 Nov 22, 2016 - expires Jan 15, 2017


* backup: don't create huge arc files with -p0
* backup: suppress some blocks size change messages


- backup: with -p0, backup would sometimes create huge arc files.  One
  arc file in a customer's backup was 21GB, even though arc-size-limit
  was 100MB.

  The backup is fine, but if you have this situation it is recommended
  that you either remove backups in reverse order starting with the
  last (most recent) backup and going back to the first backup where
  these huge arc files occur; or start a new backup and keep this
  backup until you no longer need it for retention purposes.

  If necessary, you can continue to use an existing backup with huge
  arc files, but it will be inefficient because HashBackup will not
  want to pack the huge arc files until 50% of the data is removed by
  retain.  Apologies for this one, and thanks Lourens for finding it!

- backup: if a backup block size change is normal, don't display the
  warning message

#1733 Nov 20, 2016 - expires Jan 15, 2017


* ls -a bug: displays a -> b -> c -> d for symlinks
* dest verify: bug fix for rsync destinations
* backup: --maxtime bug fix
* backup: detect block size change


- ls: with the -a option, a symlink or LV snapshot backup with
  multiple versions displayed an additional -> symbol on each version.
  Thanks Emanuele!

- dest verify: some rsync servers caused an error "unexpected rsync
  output" during a dest verify command.  Thanks Soren!

- backup: if --maxtime was specified and the time ran out, the backup
  stopped (correct) but lots of pending uploads continued to be
  transmitted (incorrect).  Now backup completes uploads that are in
  progress and stops much quicker.  Thanks Robert!

- backup: if the backup block size changes from the previous backup, a
  full backup warning message is displayed for affected files.

#1725 Nov 12, 2016 - expires Jan 15, 2017


* dest verify: more conservative about failures
* dest verify: accept WebDAV status 204
* WebDAV: more secure authentication over http
* selftest bug: sometimes did not detect incorrect file references


- if dest verify encountered any kind of error while trying to verify
  a file, it marked the file as "not transmitted", forcing it to be
  sent again.  This isn't correct behavior if a destination was only
  temporarily unavailable but did have the files.  Now verify will
  only mark files for retransmission when it gets a response from the
  remote indicating files are not there or are the wrong size.

- WebDAV: some WebDAV servers (4shared) return a status of 204 when
  checking if a file exists.  From internal testing.

- WebDAV: when the secure keyword is omitted, try digest
  authentication before regular authentication.

#1715 Nov 6, 2016 - expires Jan 15, 2017


* selftest --fix: ask to remove missing arc files
* dest verify: better verification on rsync destinations


- selftest: if an arc file goes missing, for example, it is deleted by
  accident, selftest printed an error that the file did not exist
  locally nor on any destination.  Now, if --fix is used and selftest
  is running interactively, it will ask whether to remove the missing
  arc file from the backup.  This will remove all blocks in the arc
  file, then all files referencing those blocks.

- dest verify can now verify the file size on rsync destinations even
  when cache-size-limit is set and an arc file isn't present locally.
  Previously it could only verify that the remote file existed.

#1709 Nov 3, 2016 - expires Jan 15, 2017


* database upgrade to dbrev 23
* backup, rm, retain: generating database updates is 45% faster
* b2: better error on file not found
* b2: better handling for Dir changes in dest.conf
* ssh: better error handling when fetching DESTID
* ssh: cleanup .stdin and .stdout temp files
* dest verify: don't verify files on inactive destinations
* dest verify: trouble with rsync and cache-size-limit
* HB could halt on errors while removing temp files


- this release will do an automatic database upgrade to dbrev 23 when
  any HB command is used.  Once the database has been upgraded,
  previous versions of HB cannot be used with it.  This database
  upgrade has two purposes:
  1) delete zero-length arc files created by a bug in dest verify for
     rsync destinations with cache-size-limit set (see below)

  2) prevent old releases from doing a recover since they would not
     understand the new remote database format (see next change)

- after every command that modifies the backup, HB has to send a
  database update to all remotes.  When there are lots of changes,
  creating this database update is up to 45% faster.

- b2: if a file could not be found when downloading, the filename
  wasn't included in the error message.

- b2: if the Dir keyword is changed on a destination, the next access
  causes a warning about ID mismatches.  Running dest setid fixes the
  error.  But if there were already backups to the old Dir, then
  backups are made to the new Dir, the backup files are split across 2
  directories on B2.  This is very confusing.  Now, if you try to
  download files in this state, HB will display a warning that the
  file is stored in the wrong directory.  It will still retrieve the
  file in case it's the only copy.  Thanks Robert!

  Related to this, dest verify would verify all files, even if stored
  in an unexpected directory.  Now it will complain about files that
  are in the wrong directory and upload them to the right directory if
  there is another copy available.  Thanks Robert!

- ssh: the DESTID file uniquely identifies remote storage areas.  If
  an ssh protocol error occurred while fetcing the DESTID file, HB
  would complain that the destination ID did not match - a misleading
  error message.  Now the ssh protocol error message is displayed
  instead.  Thanks Robert!

- ssh: temporary files ending in .stdin and .stdout are created by the
  ssh destination.  Usually they are deleted, but if something goes
  wrong, they hang around.  Now they have a .tmp suffix so HB will
  delete them on the next command rather than letting them accumulate.

- dest verify was trying to verify files stored on inactive
  destinations, causing a traceback.  Internal testing.

- dest verify on rsync destinations with cache-size-limit set >= 0
  would create zero-length arc files in the main backup directory for
  any arc files that were only remote, then give a "no such file or
  directory" error.  It didn't hurt anything, but also didn't verify
  the remote file and left empty arc files in the backup directory.
  Thanks Marcin!

- if HB had a problem cleaning up temporary files, it could abort with
  a traceback:
   OSError: [Errno 1] Operation not permitted: 'backupdir/hb-xxx.tmp'
  Now it displays an error message and continues.

#1698 Oct 23, 2016 - expires Jan 15, 2017


* new subcommand "verify" for hb dest
* recover: fix errors downloading segmented files
* fix: sometimes not deleting .tmp files
* b2: fix "almost dashes" in bucket names


- HashBackup assumes that once it stores a file on a remote
  destination, the file stays there until HashBackup deletes it.  This
  is usually true, but it can happen that files are manually deleted
  on a destination, and HashBackup gets no notice of this.

  Selftest -v4 verifies backup data by downloading it from all
  destinations, decrypting, decompressing, and verifying the hash of
  every block, and has an incremental option for huge backups that
  cannot be verified in a single run.  It's a very thorough check, but
  also very time-consuming.

  In this release, a new hb dest subcommand, "verify", will do a much
  quicker test of remote backup data without downloading it:

  $ hb dest -c backupdir verify

  This "best-effort" validation of remote files includes some or all
  of these checks:

  * check that all files actually exist on every destination
  * check that file sizes are correct
  * verify the checksum for destinations that support it

  Some remote storage like S3, Google Storage, and B2 are able to do
  all of these checks, while other destination types may only perform
  one or two checks.  All destinations support the new verify command
  except the "shell" destination (for user scripting).

  If a file fails to verify on a destination and there are copies
  elsewhere, it is marked missing on that destination and will be
  re-uploaded during the verify command.

- the dest.conf keyword "maxsize" is used to limit the size of files
  uploaded to remotes.  If a file to upload is larger than maxsize, HB
  splits it into parts before uploading them.  This is different than
  multipart or "large file" uploads - features supported by some

  Running recover to retrieve files that had been split would
  sometimes cause bizarre errors like checksum errors (S3/Google
  Storage), hash mismatches (B2), file size mismatches, arc version
  errors, and probably others.  The remote backup data was fine.

  This recover bug has been fixed.  The only time it would be seen in
  practice is for a large backup that created an hb.db.0 file larger
  than 5GB, meaning hb.db is probably bigger than 10GB.  This would be
  a backup of 30M+ files.  The error would only occur in the recover
  command.  From internal testing.

- sometimes HB didn't remove old .tmp files when it started

- b2: when the Backblaze B2 website displays bucket names it displays
  funky "dash-like" characters. Visually they look fine, but if cut
  and pasted into a dest.conf file, the dash is not a real dash and
  HashBackup complained that the bucket name is invalid.  Now HB
  changes the funky dash into a real dash.  A bug was filed with
  Backblaze.  From email traceback.

#1676 Oct 15, 2016 - expires Jan 15, 2017

- s3: fix traceback "No module named _strptime" caused by removing
  Glacier code

#1675 Oct 12, 2016 - expires Jan 15, 2017


* IMPORTANT SECURITY CHANGE: ls permission checks removed
* IMPORTANT SECURITY CHANGE: get permission checks removed
* ls: -d replaces --alldirs, -1 (one) replaces --onerev
* ls: only 1 pathname pattern on command line
* ls: pathname matching changed: no automatic wildcards
* ls: up to 27% faster for patterns matching many files
* ls: pathnames patterns beginning with / are faster
* ls: -d (was --alldirs) always sets -a
* ls: find personality disorder
* selftest: faster and uses less memory
* selftest: -v4 transferring same arc file
* selftest: --inc could sometimes check all archives
* Glacier removed from HashBackup


- IMPORTANT SECURITY CHANGE: ls tried to emulate Unix permission
  checking before listing files in the backup.  There were edge cases
  where it was either too permissive or too restrictive, it didn't
  handle ACLs at all, and it's possible that permission checking
  varies slight from one platform to another.  Rather than having a
  false sense of security, ALL of the permission checking in ls has
  been removed.  Now, anyone with read access to your HashBackup
  backup directory can list the metadata (permissions, filenames, file
  sizes, ...) for all files in the backup.  To secure your backup,
  secure your HashBackup directory with Unix permissions or ACLs.

- IMPORTANT SECURITY CHANGE: get tried to emulate Unix permissions,
  like ls, and that has also been removed in this version (see above).
  Now, anyone with read access to your HashBackup backup directory can
  restore any file in the backup, even if they do not have read
  permission to the file in the live filesystem.  To secure your
  backup, secure your HashBackup directory with Unix permissions or

- ls: the -d and -1 (one) options are replacing the --alldirs and
  --onerev options.  This could break scripts, but it's unlikely the
  hb ls command is used with a script and seems a low risk.

- ls: ls accepted more than 1 pathname on the command line.  ls always
  requires that wildcards be quoted, to prevent Unix from expanding
  them before ls executes, and it's extremely confusing if an unquoted
  wildcard is used with ls.  Now, ls only accepts 1 pathname on the
  command line - probably the most common usage - and if an unquoted
  wildcard is used and the Unix shell expands it, ls can display an
  error message about unrecognized arguments.

- ls: ls added a * wildcard to the beginning and end of the match
  pattern on the command line.  This caused ls to match any path in
  the backup containing the match pattern.  For example, the pattern
  def would match /def, /abcdef, /defghi, and /abcdefghi.  This made
  it impossible to match a specific filename.  Now, ls does not add
  wildcards to the pattern, so pattern def matches /def, /abc/def, or
  /abc/def/ghi, but not /abc/defxxx.  If you want the old behavior,
  use the pattern '*def*' instead.  Remember that all wildcard
  patterns must be quoted with ls to prevent the Unix shell from
  expanding them; this hasn't changed.

- ls: ls is up to 27% faster if the pathname pattern matches a lot of
  files in the backup.

- ls: with a very long pathname (more than 8 pathname components), ls
  would sometimes do a sequential search.  Now if the pathname begins
  with /, ls is much less likely to do a sequential search.

- ls: -d (was --alldirs) means to show all versions of directory
  entries.  This requires -a.  An error was displayed if -a wasn't
  used, but now -a is implied.

- ls: ls behaved more like the Unix find command, in that it listed
  all files underneath directories.  This is useful when you're
  looking for a file but don't know where it is in the filesystem, but
  it was impossible to list only the first-level files of a directory.
  Now ls behaves more like the Unix ls command and only lists 1 level
  of a directory by default.  If you want to see all files under a
  directory, add /* to the pathname pattern and be sure to quote it!

- selftest: 10% faster and peak memory use reduced by half

- selftest: with -v4, selftest could sometimes pack an arc file and
  send it to destinations on every run.

- selftest: --inc could sometimes check all archives.  From internal

- For the past 10 months there have been notices about withdrawing
  Glacier support, with a transition plan to other storage services
  such as Backblaze B2, Google Nearline, and Amazon S3 Infrequent
  Access, all priced similarly to Glacier but with simple retrieval
  policies and costs.  Glacier is removed in this release.

#1637 Sep 20, 2016 - expires Jan 15, 2017


* bumped expiration date to Jan 15th 2017
* Glacier no longer supported
* scaling improvements for small and large backups
* rm: compressing database is 33% faster
* get: show filesystem free space on write error
* get: abort restore and set exit code on write error vs hanging
* get: fix error setting flags on dangling symlinks (Linux)
* B2: update domain name to avoid redirects
* selftest: bug fix with --inc on empty backup
* S3: display correct error message when host/port are wrong


- as mentioned since January 2016, Amazon Glacier is no longer
  supported in HashBackup.  The code is still there but will be
  removed soon.  If you still have HashBackup data stored in Glacier,
  save this release so you can access your data in the future.

  For details see:

- one of HashBackup's caches was a fixed size that was reasonable for
  backups of several million files.  For larger backups, HB relied on
  OS filesystem buffering.  Usually that's fine, but sometimes it's
  not if the system is under memory pressure.  Now this cache scales
  with the size of the backup: smaller backups use less memory than
  before and very large backups use more.

  Example: a customer with a backup of 12M files totalling 7TB had
  never run retain in 6 years of daily backups.  The hb.db file was
  around 9GB.  On a 16GB system, the previous version of HB retain
  took 131 minutes to remove 7M files from the backup.  The new
  version takes 94 minutes - a 28% improvement - and uses about 1GB of
  memory. 30 minutes of the time in both cases is to compress the
  database because so many files were removed (this is now reduced to
  20 minutes thanks to the next change).  The next retain removed no
  files, took 5 minutes, and used 287MB of memory because the database
  had shrunk from 9GB to 4GB.

- rm: if enough data is removed from a backup, HashBackup will
  compress the hb.db file.  This is now 33% faster.

- get: if a write error occurred while restoring files, the error was:
  "Tried to write X bytes, could only write Y bytes".  Now it also
  displays how much free space the filesystem has since a full disk is
  usually the real problem.

- get: after displaying a disk write error message, the get command
  would hang.  Now it aborts the restore and sets the exit code.

- get: on OSX and BSD, symbolic links can have flags.  When restored,
  flags were being set on the target rather than the symlink.  Related
  to this, if the target did not exist (a "dangling" symlink), a
  restore error occurred: NameError: global name 'FS_SECRM_FL' is not
  defined. The restore continued but flags were not set for the
  symlink.  From internal testing.

- B2: changed the domain name for B2 access to avoid redirects

- selftest: if a backup had no data stored, for example, only
  /dev/null or empty files were backed up or rm / was executed,
  selftest -v4 --inc failed with a traceback.  Reported by email

- S3: when the host and/or port keywords are used but the host is not
  running an S3 service on that port, HB would fail with a traceback:

    Traceback (most recent call last):
      File "/", line 100, in <module>
      File "/", line 2651, in main
      File "/", line 331, in init
      File "/", line 253, in initdest
      File "/", line 186, in startdest
      File "/", line 251, in init1
    AttributeError: 'gaierror' object has no attribute 'status'
    AttributeError: 'error' object has no attribute 'status'

  Now it fails with the correct error:
    [gaierror] [Errno 8] nodename nor servname provided, or not known
  Reported by email traceback.

#1619 Aug 8, 2016 - expires Oct 15, 2016


* database upgrade to dbrev 22
* backup: trim executables in backup directory
* rm & retain combine small arc files
* backup: small is beautiful
* backup: -Z option compression levels
* backup: number of CPU cores used
* backup: compression verification enabled
* backup: display compression efficiency
* backup and restore benchmarks
* selftest 15% faster
* ls is 9x faster
* backup: check disk free space
* B2: 5GB maxsize default fixes connection resets on large backups
* minor fixes


- to support new features, this release will do an automatic database
  upgrade to dbrev 22 when any HB command is used.  Once the database
  has been upgraded, previous versions of HB cannot be used.  Since
  this release has quite a few changes, you may want to try the new
  release with a small backup first before using it with large
  production backups.

- backup: the HashBackup executable is copied to the backup directory as
  before, but now only the latest 3 versions are kept.  Older versions
  are automatically removed during the next backup.

- over time, a "forever incremental" backup strategy can lead to small
  arc (backup) files as older data blocks are replaced by data from
  newer backups and old arc files are packed to remove empty space.

  In this release, rm & retain combine small arc files into larger arc
  files.  This can drastically reduce the number of arc files in the
  backup directory, especially for backups with many versions.  For
  example, HashBackup's 7-year-old build server backup had 1841
  versions and around 1890 arc files.  Combining eliminated 2/3rds of
  the arc files, leaving 690, and also gave a 4-5% restore performance
  increase when restoring a very large directory.

  For now the parameters on this feature are fixed.  To combine arcs:
  - arc files must be local, OR
  - pack-remote-archives must be true for remote arc files
  - arc files must be at least pack-age-days old
  - some small arc files cannot be combined for technical reasons
  - the rm and retain commands both do arc combining

- most of the changes in this release are compression enhancements.

  Deduplication is a great tool for reducing backup sizes, especially
  for certain kinds of files like VM images, log and other
  "append-mostly" files, SQL database dumps, general "office" docs,
  and files moving around in the filesystem.  But compression plays a
  big part to save backup space.

  The goal of this release is to get better compression when possible,
  without losing performance.  In most cases, this release achieves
  better compression AND better performance.

- backup: -Z0 to -Z9 are hints to HashBackup of how much compression
  to apply.  For this release:

    -Z0            = no compression (usually slower)
    -Z1, -Z2    = fastest compression
    -Z2 to -Z9  = better compression

  In this release, -Z6 is the default and -Z7-Z9 are identical to -Z6,
  so the only use for -Z is to use lower levels to get slightly faster
  but less compression.  Of course the new version is compatible with
  and can restore all existing backups.  Most backups will get a speed
  boost and/or compress better without any command line changes.

- Previously, with no -p option, HB would use:

     one core on a single-core system (just the main thread)
     two cores on a dual-core system
     two cores for -Z7 or less (required -p to use more)
     all cores for -Z8 or -Z9

  Now, with no -p option, HB will use:

     one core on a single-core system (just the main thread)
     up to 4 cores on a multi-core system (requires -p to use more)

  It probably does not make sense to use -p4 or higher with the new
  compression technologies because they are much faster and can
  usually only keep 3 cores busy.  You can use the new %CPU statistic
  to see if more cores will increase performance.  For example, if
  %CPU is over 210% with -p2, try -p3.  If CPU is 220% with -p3 (less
  than 300%), it likely means that more cores (-p4) is not going to
  increase performance because HB is not keeping 3 cores busy.

- some of the compression technologies in HashBackup are rather new.
  To ensure that all data can be restored, compression verification is
  enabled.  During backups, HB will expand compressed data and verify
  it against the original before saving it to your backup.  This is
  only for the newer compression technologies (lz4, zstd, brotli,
  lzma).  It's a temporary safety measure that will slow down backups
  a bit for now, but will be removed in the future.

- to help evaluate performance, backup displays a new statistic:
  Efficiency.  It is "MB reduced per CPU second", so higher numbers
  mean better efficiency.  This makes it very easy to compare backup
  options by showing how much extra CPU time is being spent to reduce
  backup data size.

  The efficiency rating measures overall HashBackup efficiency, so it
  can also be used to compare compression, block sizes and dedup
  options.  It's best to change one option at a time when comparing
  different efficiencies, to understand how that option affects

- no discussion of compression would be complete without benchmarks.
  Here are a few, comparing the old and new version of HashBackup with
  different kinds of backups.  All of these tests were run on a 2010
  Macbook Pro (Intel Core2Duo, 2.5GHz) with a solid state drive.

  Test #1: backup /Applications (212K files) with dedup, default -Z option

    Results: backup is 20% faster, uses 25% less CPU, and is 7% smaller

             old HB:  6m 8s real time
                      6m 55s CPU time
                      2.257568 GB backup size

             new HB:  4m 56s real time (20% faster)
                      5m 11s CPU time  (25% less)
                      2.089324 GB backup size  (7% smaller)

  Test #2: backup 2GB Ubuntu VM image with dedup -B4k (block size)

    Results: backups are 35-86% faster, use less CPU, and are 9-12% smaller
            (% savings of new version vs old version shown in parentheses)

                        Time        CPU           Size
    old HB, -Z1          73s        126s         955 MB
    old HB, default -Z   78s        135s         943 MB
    old HB, -Z9         339s        665s         976 MB

    new HB, -Z1          45s (38%)   70s (44%)   1.0 GB (-5%)
    new HB, default -Z   47s (40%)   76s (44%)   858 MB (9%)
    new HB vs old -Z9    47s (86%)   76s (88%)   858 MB (12%)

  Test #3: backup 2GB Ubuntu VM image with fastest options: dedup, -B1M, -Z1

    Results: backups are twice as fast and up to 3% smaller

                        Time        CPU           Size
    old HB, -B1M -Z1     49s        88s          935 MB
    new HB, -B1M -Z1     23s        36s          1.1 GB
    new HB, -B1M no -Z   26s        45s          903 MB

  Restore tests for the backups above:

  Test #1: restore /Applications (212K files), default -Z:

       old:  3m 29s real time
       new:  3m 11s real time (13% faster to restore)

  Test #2: restore 2GB VM image, default -Z:

       old:  1m 5s real time
       new:  0m 54s real time (17% faster to restore)

  Test #3: restore 2GB VM image, fastest options: -B1M -Z1

       old:  0m 33s real time
       new:  0m 26s real time (21% faster to restore)

- selftest: the basic selftest check is 15% faster on a backup with
  13M blocks and 110M block references - lots of blocks and lots of
  dedup.  Making selftest more efficient is not about speed; it's an
  ongoing process to scale HashBackup to handle ever-larger backups.

- ls: the basic ls command to list all pathnames in a backup is 9x

- backup: to avoid filling the backup directory disk, backup will
  abort if there is not enough space for 2 arc files of size
  arc-size-limit (hb config option).

- B2 has a 5GB upload limit and connection resets occur when exceeded.
  Large backups can generate an hb.db.0 file that is bigger than 5GB.
  To prevent connection reset errors, the default for maxsize (a
  dest.conf keyword) is now 5GB.  This forces HashBackup to segment
  very large files.

- selftest: selftest displays a progress indicator when it is run
  interactively.  In certain circumstances with -v3 and -v4, the
  percentage would go over 100%.  From internal testing.

- stats: because of a previous bug in rm, the stats command could
  display a negative number of files.  This is fixed.  Thanks Soren!

- rm: if a problem occurred while removing a data block, rm noted the
  error and continued but it could cause a selftest error later.  From
  internal testing.

- backup: the inex.conf file is now included (encrypted) in every
  backup.  Previously, after a recover command, the inex.conf file was
  missing.  The next backup would create a new default inex, but any
  site changes would be lost.  Recover does not automatically restore
  inex.conf, but displays a get command to do it if it's appropriate.

- get: if cache-size-limit was set >= 0, one CPU could get stuck in a
  loop.  It didn't affect the restore, but it did run slower.  This
  would have been especially noticeable on 1 or 2 core systems.  From
  internal testing.

- get: another case of the race condition (multi-thread bug) that was
  fixed in #1481, was fixed in this release.  If an error occurred
  during the restore, the get command might do any of these:
  - give a Bad file descriptor error
  - give a size mismatch error if restoring multiple files
  - hang
  This rare bug was reported by HB's email traceback on July 18th,
  though the bug has been present since February.  Thanks HB!

- selftest: after a full -v3/4 selftest, incremental selftest (--inc)
  always did a full selftest, downloading every arc file.  From
  internal testing.

#1538 Jul 2, 2016 - expires Oct 15, 2016


* get: restore planner traceback on sparse files
* rm/retain: caused incorrect stats output
* rm/retain: error message fix


- get: if cache-size-limit is set, the restore planner could fail with
  an error when a sparse file was being restored:
    TypeError: unsupported operand type(s) for >>: 'NoneType' and 'int'

- rm/stats: when rm removed a file, it was not adjusting a counter
  correctly, causing the stats command to sometimes display a negative
  number of files in the backup.  This fixes the cause, but the stats
  display won't be fixed until the database upgrade of the compression

- rm: if -v 2 was used with retain, it caused a traceback.  Retain -v
  doesn't accept an integer after it, so the 2 was interpreted as a
  pathname, and pathnames for retain have to start with slash.  So the
  error should have been "All pathnames must start with /: 2" But
  because of a bug in the error handler, it caused a traceback.

#1532 Jun 22, 2016 - expires Oct 15, 2016


* bump backup expiration date to October 15th
* rm/retain: prevent unused pathname selftest warning


- the compression enhancements are going well, but since it is near
  the end of a quarter, they're being released after July 15th instead
  of before.  To try them out, upgrade again after July 15th.

- rm/retain could sometimes cause an unused pathname selftest warning.
  If you are getting unused pathname warnings from selftest, run
  selftest --fix to correct them.

#1528 Jun 3, 2016 - expires Jul 15, 2016


* new backup option --no-ino


- backup: some filesystems don't have stable inode numbers: FUSE
  (sshfs, UnRaid), Samba/CIFS, and probably others.  HashBackup checks
  for inode number changes during backup, but because these
  filesystems have random inode numbers, it can cause unpredictable
  full backups.

  The new option --no-ino can be used with these filesystems to bypass
  the inode number check.  A new warning is displayed periodically
  when this option might be necessary:

    Warning: many unstable inode numbers; use --no-ino to avoid a full backup

  A negative side-effect of using --no-ino is that hard links cannot
  be detected because the definition of a hard link is two files with
  the same inode number.

#1525 May 25, 2016 - expires Jul 15, 2016


* new command "rename"
* backup compresses wav files
* S3 improved multipart error handling


- a new command, "rename", can be used to change the pathnames of
  files and directories in a backup.  This can be useful when
  filesystem paths change for some reason, and you want pathnames in
  an existing backup to match so that the renamed files aren't all
  backed up again.  If dedup is activated, this is not a big concern.
  But sometimes a renamed path contains so much data that you want to
  avoid reading it all again just to find out it hasn't changed.
  See hb help rename or the rename web page for more details.

- backup: wav files are no longer in the list of uncompressible file
  extensions since they can sometimes be compressed 10-20%, though
  other times only a few percent, depending on the file content.

- S3: when multipart upload is enabled (it is by default), HashBackup
  will do retries on each part, then if that fails, will do retries on
  the whole file.  This makes multipart error recovery more efficient
  and fixes a problem where an S3 timeout or broken pipe (S3 closed
  the connection) would cause backup to abort.  If configured for the
  default 3 attempts, HB will now try 9 times before giving up.

#1521 May 17, 2016 - expires Jul 15, 2016


* database upgrade to dbrev 20
* creation date is saved and restored on Mac and BSD
* new config option hfs-compress for Mac/OSX
* get: bug fix restoring hidden files on Mac/OSX
* Backup Bouncer test results posted on web site


- this release will do an automatic database upgrade to dbrev 20 on
  any HB command, to support saving and restoring creation date.

- on BSD and OSX, files and directories have a date created timestamp
  that HB didn't save or restore.  Now it does.  On Linux, some
  filesystems have a creation timestamp but there is no standard OS
  interface to get or set it, so it is still not supported by HB.
  Note: "ctime" is not date created, but is the last time the inode
  was changed (changed time, not creation time).

- Mac OSX uses the HFS+ filesystem.  HFS has the ability to store
  files compressed.  For example, /Applications/Address Book contains
  many compressed files.  HashBackup has always restored compressed
  files as uncompressed.  It works fine, but previously-compressed
  files would use more disk space.  Now, a new True/False config
  option "hfs-compress" can be set True to re-compress these files on
  restore.  hfs-compress defaults to False because unfortunately,
  restoring compressed files takes 10x longer.  The only standard
  software supplied by Apple to compress a file is the external ditto
  program, and running that for each compressed file is slow.  If you
  don't care about the slowdown and want compressed files to stay
  compressed on restore, set hfs-compress to True with hb config.

- get: on some filesystems, files and directories can be marked
  invisible.  HB was saving this but not restoring it.  Now it does.
  Found by Backup Bouncer.  Thanks Roy!

- Backup Bouncer is a backup test program that checks a lot of things
  on a Mac HFS+ filesystem.  With the two changes above, HashBackup
  passes the Backup Bouncer test.  The one caveat is that BB sets the
  nodump bit on files, so HB doesn't save them (that's what the nodump
  bit means), but BB expects the backup program to save them anyway.
  A full Backup Bouncer report is posted on the website.

#1514 May 11, 2016 - expires Jul 15, 2016


* new dest.conf keyword "onfail"
* Backblaze B2: Dir / bug
* Backblaze B2: changes for self-certify
* Backblaze B2: sanitize debug log
* Backblaze B2: add time & thread number to logs


- a new dest.conf keyword, onfail, adds flexible handling of
  destination failures, requested by several customers.

  If the onfail keyword is not used on a destination and it fails,
  HB will continue the backup as it does today.  What is different is
  that the destination failure will count as a backup error and the
  exit code will be non-zero.  HB should have done this all along.

  If "onfail stop" is used, HB will stop immediately if the
  destination fails.

  With "onfail ignore", HB will behave exactly as it does today: the
  backup will continue, no backup error is reported, and the exit
  status is not affected.  This will be necessary for example if you
  backup to 2 USB drives in rotation.  If you have cache-size-limit
  set >= 0 and the cache fills up because a destination has failed,
  backup will have to stop, just like today, even with onfail ignore.
  To be exactly compatible with previous releases, add onfail ignore.

- Backblaze B2: using Dir / in dest.conf causes an error:
    http status 400 (File names must not start with '/') listing file: /DESTID
  Thanks Niels!

- Backblaze B2: add Backblaze-recommended error handling for status
  403 (account limit), 429 (too many requests), Retry-After, broken
  pipe, and timeouts, so HashBackup can be self-certified.

- Backblaze B2: when the debug keyword is used, a B2 traffic log is
  written to the backup directory.  Previously the log contained B2
  credentials used to sign-in to B2.  Now credentials are replaced by
  'xxx' so that the log can be shared for troubleshooting purposes.
  Only Authorization and authorizationToken are replaced; accountId,
  fileName, fileId, bucketId, and uploadUrl are not replaced since
  they are not usable without authorization credentials and would make
  debugging more difficult.

- Backblaze B2: log lines begin with a time and thread number to help
  trace multiple workers' activity.  It's still recommended to set
  workers to 1 for debugging.

#1504 Apr 21, 2016 - expires Jul 15, 2016


* init bug fix


- init: hb init -c /xyz/backupdir would cause a traceback if the
  directory /xyz did not exist.  Now it says parent directory doesn't
  exist and exits cleanly.  From traceback email.

#1502 Apr 18, 2016 - expires Jul 15, 2016


* backup: sparse file bug fix


- backup: backing up large sparse files did not always work correctly.
  It worked with -p0 (single-threaded) but not multi-threaded.
  Sometimes a warning message about a size mismatch would occur during
  backup.  A selftest -v5 displayed a hash mismatch error.

  This release will do a database upgrade to mark all sparse files as
  partial backups.  The next backup will re-save the sparse files.
  After a good copy is saved, the next retain will remove all of the
  partial files.

#1499 Apr 12, 2016 - expires Jul 15, 2016


* add --verify option to
* clear command bug fix


- in #1498, an ls command was added after every rclone transfer to
  verify the send worked.  However, some storage systems like Amazon
  Cloud Drive may have a slight delay after an upload before the file
  is available.  When this happens, ls returns an error, HB retries
  the send, the file is usually there and is not sent again.  But a
  bogus error was displayed.

  Now, the ls after send is only done if --verify is used on the run
  command line in dest.conf.  This is highly recommended if you have
  cache-size-limit set: if something goes wrong, perhaps because of a
  bug in HB, rclone, or the storage system, and HB believes the file
  was transferred when it was not, HB will delete the local copy.
  Since there is no remote copy either, it breaks the backup.  A
  selftest -v4 -fix will correct this, but it will also remove files
  from the backup.

- apologies to the rclone developer: the 256 exit code was confusion
  about how Python worked, not an rclone problem.  Sorry!

- clear: if the clear command was interrupted, files on remotes could
  sometimes not be deleted by running clear again, because HB thought
  the files were already deleted.

#1498 Apr 11, 2016 - expires Jul 15, 2016


* and dest.conf.rclone


- the rclone build was missing the script and example dest.conf.rclone

- rclone sometimes returns exit codes 256 for error conditions, but
  exit codes have to be 0-255.  This version of checks and
  adjusts the exit code, and also does an ls command after every send
  to verify that the files are actually on the remote.

#1497 Apr 8, 2016 - expires Jul 15, 2016


* hb get progress stats bug


- get: for -v0/1, progress stats should not be displayed because
  filenames are not printed

#1496 Apr 6, 2016 - expires Jul 15, 2016


* Add shell destination and dest.conf.rclone


- Rclone is a free program that syncs directories to several cloud

    Google Drive
    Amazon S3
    Openstack Swift / Rackspace cloud files / Memset Memstore
    Google Cloud Storage
    Amazon Cloud Drive
    Microsoft One Drive
    Backblaze B2
    Yandex Disk

  HB has native support for many of these services, and that should be
  used when possible.  With this update, Rclone can be used to access
  services that HB doesn't support natively, such as Amazon Cloud
  Drive.  See doc/dest.conf.examples/dest.conf.rclone and
  for more details.  Thanks Ziv for the suggestion of using Rclone!

  IMPORTANT: Only Hubic and Backblaze were tested to develop the script. ACD has been tested by a user, but you are "on
  your own" when using HB with Rclone.  Questions are fine but support
  is limited for now, so this method is not recommended for production
  or critical backups.

  This release doesn't have any HB program changes; it just includes
  a new dest.conf example and the script.

#1493 Apr 1, 2016 - expires Jul 15, 2016


* Unix support has been removed & HB now runs only on Windows
* selftest could hang when cache-size-limit is set >= 0


- developing HashBackup on 5 different Unix platforms has become too
  much of a challenge, so going forward, HB will be released on these
  12 Windows platforms instead: Windows XP, Windows XP Pro, Windows
  Vista, Windows 7, Windows 8, Windows 8.1, Windows 10, Windows Server
  2008, Windows Server 2008 R2, Windows Server 2012, Windows Server
  2012 R2, and Windows Server 2016.  This will greatly simplify HB
  development.  The change takes effect today, April Fool's Day.

- selftest wasn't following cache protocols correctly, so if
  cache-size-limit was set and arc files needed to be downloaded,
  selftest could hang because it thought the cache was full

#1492 Mar 29, 2016 - expires Jul 15, 2016


* dir destination displays detailed info when fetching files if
  dest.conf debug keyword is non-zero to debug a customer site issue

#1491 Mar 24, 2016 - expires Jul 15, 2016


* a debug print was accidentally left in and is now removed

#1490 Mar 23, 2016 - expires Jul 15, 2016


* backup: handle read errors better
* backup: handle file size changes better
* init: add warning for -k env and -k ask


- backup: read errors are unusual these days, but they do happen.
  Backup didn't handle them well: it printed an I/O exception
  traceback and hung.  Now it shows an I/O error message with the
  pathname, stops backing up the file, marks it as a partial backup,
  and continues the backup.  From emailed exception report.

- backup: if a file changes size during backup, backup printed a
  warning message.  Now it re-checks the file to get the current size
  and if it matches what was saved, no warning is printed.  If it
  doesn't match what was saved, the file will be saved again on the
  next backup.  From internal testing.

- init: display a warning if -k ask or -k env are used, since these
  are for the -p option, not -k.  From internal testing.

#1487 Mar 13, 2016 - expires Jul 15, 2016


* #1483: bump backup expiration date for stable upgrade
* #1484: --maxtime fix and enhancement
* #1485: selftest --inc was sometimes incomplete
* #1487: off by 1 in selftest --inc


- backup: when --maxtime is used, backup creates a restart checkpoint
  if the time limit is exceeded.  If the next backup did not use
  --maxtime, it could still use the checkpoint data (a bug), and would
  set the checkpoint.  Now, only backups with --maxtime look at or
  set the checkpoint.  This allows running a nightly cron backup
  with --maxtime, and running other ad-hoc backups without --maxtime
  will not reset the checkpoint.  From internal testing.

- backup: related to --maxtime, if a backup aborted, even if --maxtime
  was never used, the next backup might not save anything. It was
  incorrectly restarting a checkpoint that wasn't setup properly.
  Thanks to Daniele at EURAC.

- selftest: with the --inc option (incremental selftest), depending on
  the options used and the size of arc files, selftest might not always
  cycle through all of the archives before cycling back to the first.
  This bug was introduced in #1454.  From internal testing.

- selftest: because of round-off error, selftest --inc would sometimes
  require an extra run to test all archives.  From internal testing.

#1481 Feb 20, 2016 - expires Apr 15, 2016


* minor fixes


- backup: sparse file mapping is 30% faster on Linux

- backup: since #1473, a fifo backup could get an Illegal seek error

- backup: when a small fifo or raw device was backed up, the Saved:
  number displayed at the end could be negative

- get: fixed a bug in the new restore code that caused this:
    File size mismatch, should be 288, is 0: <pathname>
    Exception in thread writeq_loop: [Errno 9] Bad file descriptor
    File "/", line 85, in start_thread
    File "/", line 168, in writeq_loop
  Thanks to Juha Pyy for reporting this.

#1479 Feb 14, 2016 - expires Apr 15, 2016


* get: performance improved
* recrypt and sha256 commands removed
* upgrade: failed if a transaction was pending
* backup: removed error messages on sparse files
* backup: sparse file bug with huge files (Linux)


- get: in #1473, the get command for all restores was about 2x slower
  because of the new sparse file handling.  Now it's back to its
  previous performance levels for most situations, and restoring files
  with a small block size (VM images) is about 30% faster than ever.
  Restores using bzip2 compression (-Z8 or -Z9) are still slower than
  before #1473.  Compression improvements are coming...

- the recrypt and sha256 commands have been removed.  HB no longer
  uses the SHA256 hash.  The recrypt command was seldom used and while
  running, left the backup in a precarious "half old key, half new
  key" state.  The rekey command, used to change the backup key, was
  not removed.

- if the previous HB command aborted and left an open transaction,
  upgrade would fail with a traceback:
    Exception: Can't upgrade database with a transaction active
  Thanks to Frank Riley.

- backup: if an OS doesn't support hole-skipping, error messages are
  no longer displayed.  The file is still backed up normally, just
  without skipping the holes.  On a restore, the holes of a sparse
  file are always created.  The Sparse: line at the end of the backup
  tells how many hole bytes were skipped (none if no Sparse: line).

- backup: some Linux filesystems return a partial sparse map under
  some circumstances, causing an bad sparse file backup.  There are
  now checks to detect this, but if you made critical sparse file
  backups with #1473, it's probably best to redo them.  Seems to
  happen mostly with very large (>4GB) files.  From internal testing.

#1473 Feb 10, 2016 - expires Apr 15, 2016


* database upgrade to dbrev 18
* backup: fast hole-skipping for sparse files
* get/selftest: could stall when cache-size-limit set
* export: clears file and block hashes
* backup: fix invalid argument error on VMware vmfs filesystems
* other minor changes


- this rev will do an automatic database upgrade to dbrev 18 when any
  HB command is used.  The database isn't actually modified, other
  than being stamped dbrev 18 to prevent older versions of HB from
  accessing the database.  There are new data structures created in
  this release for sparse files that older versions wouldn't

- backup: backup can skip "holes" (unallocated disk space) in sparse
  files rather than backing them up.  This is OS version dependent, as
  well as filesystem dependent, so it will work when it can and
  fallback to a regular backup when it can't.  Sparse files are mostly
  used for "thin provisioned" VM disk image files.

  Restoring large sparse files can be a bit slow and tedious because
  it may require many lseek() calls to re-create the original holes.
  Sparse files saved with this release will restore faster than with
  older versions of HB if the holes are large.

  Here's an example of backing up and restoring a 10GB sparse file:

    $ echo abc|dd of=sparsefile bs=1M seek=10000
    0+1 records in
    0+1 records out
    4 bytes (4 B) copied, 0.000209 seconds, 19.1 kB/s

    $ hb backup -c hb -D1g -B4k sparsefile
    HashBackup build #1463 Copyright 2009-2016 HashBackup, LLC
    Backup directory: /home/jim/hb
    Copied HB program to /home/jim/hb/hb#1463
    This is backup version: 0
    Dedup enabled, 0% of current, 0% of max

    Time: 1.0s
    Checked: 5 paths, 10485760004 bytes, 10 GB
    Saved: 5 paths, 4 bytes, 4 B
    Excluded: 0
    Sparse: 10485760000, 10 GB
    Dupbytes: 0
    Space: 64 B, 139 KB total
    No errors

    $ mv sparsefile sparsefile.bak

    $ time hb get -c hb `pwd`/sparsefile
    HashBackup build #1463 Copyright 2009-2016 HashBackup, LLC
    Backup directory: /home/jim/hb
    Most recent backup version: 0
    Restoring most recent version

    Restoring sparsefile to /home/jim
    Restored /home/jim/sparsefile to /home/jim/sparsefile
    No errors

    real    0m0.805s
    user    0m0.150s
    sys     0m0.100s

    $ ls -ls sparsefile*
    20 -rw-rw-r-- 1 jim jim 10485760004 Feb  8 18:03 sparsefile
    20 -rw-rw-r-- 1 jim jim 10485760004 Feb  8 18:03 sparsefile.bak

  Any -B block size (or none) can be used with sparse files.

- get/selftest: since #1454, if cache-size-limit was >= 0, get and
  selftest could stall in some situations.  Thanks to Max Norton at
  Aria Networks for reporting and helping with this.

- export: sets all block and file hashes in the exported database to
  spaces.  The purpose of export is to debug hard-to-reproduce HB
  problems.  These hashes aren't useful for debugging and the goal of
  export is to remove as much sensitive information as possible from
  the exported data while still being able to use it for bug hunting.

- backup: on Linux, VMWare vmfs filesystems could give an invalid
  argument error when trying to open files.  Thanks to Ron Joffe.

- init: display an error message if unable to set owner-only
  permissions for the backup directory rather than halting with a

#1460 Feb 7, 2016 - expires Apr 15, 2016

- B2: if the copy-executable config option was set to True, HB would
  try to copy the program to the B2 storage service.  But because of
  the # in the filename, it would cause a broken pipe error.

- compare: for hard links, compare -f (verify file hashes) might show
  a file's data had changed when it really had not.  From internal

- compare: always ignore link and size changes for directories since
  that doesn't mean much.  On Linux, if you create 10K files in a
  directory then delete them all, the directory will have a large size
  even though it is still empty. From internal testing.

#1456 Jan 24, 2016 - expires Apr 15, 2016

- backup: if linux-backup-attrs is set to True with hb config (the
  default is False), an error could occur:
    NameError: global name 'pathname' is not defined

#1454 Jan 21, 2016 - expires Apr 15, 2016


* IMPORTANT: HB Glacier support ending mid-2016
* database upgrade (again!) to dbrev 17
* backup: incremental backups improved 5-10%
* backup: raw and VM image backups 10% faster
* S3: support Infrequent Access storage class
* S3: fix sporadic broken pipe errors
* selftest --inc fix with small backups
* backup: file hash changed to SHA1
* get: bug in raw and VM restores when cache-size-limit set
* Backblaze B2 improvements
* get: didn't always respect cache limits on restore
* misc minor changes


- backup: sometime mid-2016, HashBackup will no longer support Glacier
  as a destination.  There are several good alternatives:

  * Amazon S3 Infrequent Access (1.2 cents/GB/mo)
  * Google Nearline Storage (1 cent/GB/mo)
  * Backblaze B2 (.5 cents/GB/mo)

  An email has been sent to any HashBackup customers that have written
  in explaining why Glacier support is ending and how to do the
  migration from Glacier to another service.  This information is also
  available in the doc/dest.conf.examples/dest.conf.glac file from the
  Download section of the HashBackup site (expand the tar file).

- NOTE: this rev will do an automatic database upgrade to dbrev 17
  when any HB command is used. This upgrade modifies a database index
  and doesn't take too long.

  Because there have been several recent DB changes, HB may do
  consecutive database upgrades if your version is before #1405.
  Don't worry, that was designed in from the beginning and isn't
  anything new.  It looks like this (versions command):

    Current database rev: 14
    Upgrading database to rev: 16
    Copying /testhb/hb.db to /testhb/hb.db.orig before upgrade
    Copying /testhb/dest.db to /testhb/dest.db.orig before upgrade
      Upgrade to rev 15...
    Alter database
    Rename hb.NNNN programs to hb#NNNN in backup directory
    Remove dedup table; backup will rebuild it
      Upgrade to rev 16...
    Alter database
    Database upgraded to rev 16
    Showing recent versions
      0 501(jim) 2016-01-01 13:49:58 - 2016-01-01 13:49:58 #1335 

  In the unlikely event of a problem with the upgrade, the original
  database is restored.  That looks like this:

    Unable to upgrade your database to rev 16
    Restored database
    Restored dest.db
    See traceback below or in stderr redirected file
    Traceback (most recent call last):
      File "", line 208, in <module>
      File "", line 144, in main
      File "", line 173, in opendb
      File "", line 434, in upgradedb
    Exception: some error message

- backup: incremental backups spend most of the time scanning the
  filesystem for modified files.  This scan is now 5-10% more
  efficient.  Because the scan is mostly IO-bound and very "seeky",
  this could show up as either lower CPU usage or faster backups,
  especially on large backups with a lot of history and directories.

- backup: real and simulated raw device and VM image (.vmdk) backups
  are about 10% faster

- a new S3 dest.conf keyword, "class", can be set to either standard
  or ia.  When set to ia, backup files will be stored in Amazon S3's
  Infrequent Access storage class *IF* it will be cheaper than the
  standard S3 storage class.  For small files and files that are
  expected to be stored less than 13 days, standard storage turns out
  to be cheaper than IA because:

  -- IA charges for 128K if files are smaller than 128K
  -- IA charges for 30 days if files are deleted before 30 days

  Amazon also charges 1 cent/GB extra to download from IA.

  Amazon S3 Infrequent Access (1.2 cents/GB/mo), Google Storage
  Nearline (1 cent/GB/mo), and Backblaze B2 (.5 cents/GB/mo) are all
  good options for migrating off Amazon Glacier (.7 cents/GB/mo),
  since HashBackup support for Glacier will be ending in mid-2016.

  The class keyword only works with type s3 destinations.  Other
  S3-compatible destinations like gs (Google Storage) set the storage
  class on the bucket rather than on individual files, so use their
  website to set the bucket storage class.

- S3: Ben Emmons reported a sporadic problem with broken pipe errors
  on S3 for buckets in any region other than us-east-1 (aka US).  Ben
  did some research to explain the cause: HB was sending all requests
  through the US standard region, which is not recommended.  Now the
  location keyword is used to communicate directly with the correct
  region.  If location is anything other than US or us-east-1, it must
  match the region the bucket was created in.  Thanks Ben!
- selftest: with --inc (incremental selftest), small backups were
  tested too often.  Selftest always wanted to test at least 1 arc
  file every run, so if the backup had only 1 arc file, that file
  would be tested every time, even if the goal was 1d/30d ("selftest
  runs every day, verify files every 30 days") Now selftest will test
  the one file (or small backup) once every 30 days.

- backup: during a backup, HB splits a file into blocks, hashes each
  block with the SHA1 cryptographic hash, and uses these hashes to
  find duplicate data.  The block hash is verified during restores to
  ensure that each block's data has remained the same through all of
  its travels through HB itself and to and from remote destinations.

  As an extra safeguard, HB also stores one hash for each file backed
  up.  This ensures that after a restore, the correct blocks were
  restored, in the correct order, and provides extra reassurance that
  no hash collisions occurred during dedup.  (Hash collisions are
  nearly impossible with SHA1, but they still get a lot of attention.)

  Very early on, the SHA256 hash was chosen for the whole file hash.
  In hindsight, this was a poor decision, because it is the slowest of
  all the SHA family of hashes - even slower than SHA512 - and maxes
  out at around 100MB/s on common computers.

  Going forward, the SHA1 hash will be used for the whole file hash.
  SHA1 is about 3x faster than SHA256 and still provides the extra
  layer of error checking to ensure that restored files are identical
  to when they were saved.  All HB commands have been modified to
  handle both the old and new file hashes.

- get: if cache-size-limit is set >= 0, it means some backup data is
  not stored locally.  So get (and selftest -v5) create restore plans
  to figure out the best way to retrieve arc files to use the least
  amount of disk space and not download any arc file more than once.
  But if a raw device was restored, the plan would say "1 item, 0
  bytes", and then during the restore, would fetch the arc files one
  by one as needed, and not delete any until after the restore
  finished.  In other words, there was no plan.  This has been fixed.
  A typical plan will look like this:

    Planning cache...done
      Archives: 4
      Blocks: 46
      Download size: 44 MB
      Peak cache size: 11 MB
      Disk free space: 75 GB, 30%
      Items: 1
      Data bytes: 52 MB

  The important part is Peak cache size, since it tells how much disk
  space will be needed in the backup directory to do the restore.

- Backblaze B2 improvements:
  * HB did not handle bucket names with upper case letters
  * HB would sometimes get http 401 status errors because of a B2 bug
  * If HB cannot create a bucket, it displays the reason why, from B2
  * In general, for most errors HB displays a better message from B2
  * If a bucket ID is mistakenly used in dest.conf, it works now
  * Documentation has been improved a bit
  Thanks Thorsten for pointing these out!

- get: when cache-size-limit is >= 0, get was not respecting the cache
  size limit.  A restore of a 500GB VM image said it would download
  230GB and need 14GB of space in the backup directory.  But if the
  data could be loaded from the destination faster than the restore,
  the backup directory would go over the 14GB limit and cause a disk
  full error.

- backup: in certain circumstances, a file could be skipped instead of
  backed up, but the next backup would catch it

- get: a null pathname on the command line ('') caused a traceback

- mount: if the FUSE library is not installed, mount raised an
  EnvironmentError, which is not so user friendly.  Now, mount
  displays 5-10 lines of information about what FUSE is, where it's
  located, and tips on how it can be installed for the mount command

- selftest: added a new test for -v2 and above

#1405 Dec 20, 2015 - expires Apr 15, 2016


* expiration date bumped to April 15th
* database upgrade to dbrev 15
* backup: dedup uses less memory & is faster
* backup: fast restarts for --maxtime
* backup: simulated backups 25% faster
* rm: new pack config options
* using pack-percent-free to control download costs
* backup: default arc-size-limit is 100MB vs 1GB
* selftest: add size limit to --inc
* add -v2 to compare command
* HB reports exceptions via email
* get: raw device restore bug fixes
* misc minor changes


- NOTE: this rev will do an automatic database upgrade to dbrev 15
  when any HB command is used. This upgrade:
  - modifies a database index
  - renames backup directory programs hb.NNNN to hb#NNNN
  - removes "hb" (very old program version) from backup directory
  - deletes the dedup table; next backup will re-create it

- backup: uses half as much RAM to dedup the same number of blocks.
  The first backup with this version will rebuild the dedup table, so
  may take longer.

- backup: single-core variable block dedup is 5% faster (-D -p0 no B).
  Initial multi-core disk image backups (.vmdk, raw, etc) are 12% faster.

- backup: --maxtime and --maxwait were added in #862 to control the
  backup time for huge backups.  See the #862 changelog for details.
  The initial backup for a multi-TB filesystem with millions of files
  could require several days.  Using --maxtime 6h lets HB backup for 6
  hours every night until all data is saved.  Then incrementals tend
  to be much faster and have no trouble finishing within the backup

  The enhancement in this version is that restarts after the time
  limit are much faster, allowing HB to completely skip huge portions
  of the filesystem already backed up.  This only occurs when
  --maxtime is used, although if you want this shortcut restart
  behavior all the time, use --maxtime 1y.

- backup: simulated backups are up to 25% faster: 155 seconds now vs
  210 seconds with #1371 on a 4.5GB VM image (.vmdk files)

- rm: two new config options control archive packing:

  pack-age-days: specifies the minimum archive age in days.

  Many storage services are adding delete penalties for files removed
  before N days.  For Google Nearline and Amazon Infrequent Access
  storage, it is 30 days.  HB needs to be aware of this because
  otherwise, it could pack an archive several times in a month if
  enough data was removed, which would cost more than leaving the data
  alone.  The new HB default for this is 30 days.  To disable this
  option and preserve existing behavior, set it to 0.

  pack-bytes-free: specifies the minimum # of free bytes.

  This option prevents repeated packing of small files.  For example,
  if an arc file is 50K and 25K is deleted, it would be packed if
  pack-percent-free is 50 (the default).  But, it's not worth the
  trouble for such a small savings. The default for this option is
  1MB.  To preserve existing behavior, set it to 4K, the minimum.

- IMPORTANT NOTE for the pack-percent-free config option: most storage
  services are charging high download rates compared to storage rates.
  For example, it costs 8x more to download a file from S3 Infrequent
  Access (10 cents/GB) than to store the file for a month (1.25
  cents/GB).  Stated another way, it costs the same to store a file
  for 8 months or download it just once.  If pack-remote-archives is
  set to True (the default is False), and cache-size-limit is >= 0
  (not all archives are stored locally), consider bumping
  pack-percent-free much higher than 50 to limit packing downloads.

  If cache-size-limit is -1, meaning a local copy is kept of all
  archives files, packing does not require a download so this config
  option is not as important for cost control.

-- backup: the default arc-size-limit for new backups is now 100MB
   instead of 1GB.  If you want to change the arc size for existing
   backups, use: hb config -c backupdir arc-size-limit 100mb.  This
   will increase the number of arc files used, but there are very
   large sites running in this configuration with 40-50K arc files,
   without problems.  Smaller archives make it more likely that HB can
   manage remote storage with delete commands instead of downloading,
   packing, and uploading.

- selftest: the --inc freq/goal option is used to do incremental
  selftests, where a portion of the backup is checked every day.  The
  freq and goal specify the percentage of the backup space to be
  checked (freq/goal).

  Many storage services have a free allowance for downloads, for
  example, Backblaze B2 allows 1GB/day, and charges after that.  To
  ensure incremental selftest doesn't go over the free allowance, a
  new limit option can be added. For example, --inc 1d/30d,500m means:

  * selftest is run every day by cron (or manually, etc)
  * check the whole backup every 30 days
  * the percentage is therefore 1/30, or 3.3% each run
  * but, don't check more than 500MB of archive space in one run

  Selftest may still go over the limit if a single arc file is bigger
  than the limit.  When the limit is triggered, selftest will display
  a message.  In this case, your goal cannot be met, so a selftest of
  the complete backup will take longer than your specified goal.

- the compare command compares a backup with a live filesystem and
  indicates new, changed, and deleted files.  For changed files, the
  compare command shows which attributes changed.  The new -v2 option
  shows the backup and filesystem values for each changed attribute.

- HB reports all unhandled exceptions (tracebacks) via email.  This
  email includes the HB version number, command line, traceback, and a
  short system description (Linux, Mac OSX, or BSD)

- get: raw device restore fixed

- get: show progress for large files only if displaying output

- selftest: before, would go one arc file over the limit with --inc
  instead of staying under the limit.  For GB-sized arc files, it
  makes a difference.

- backup: a simulated backup could end with a traceback:
    AttributeError: Arc instance has no attribute 'iobuf'

- selftest: an incorrect error was displayed:
    Error: for logid 1786855, hlogid 1786854 is invalid: /bin/ln [r836]
  This was a bug in selftest, not a problem with the database.

- backup: OSX system files were sometimes incorrectly tagged sparse

- ls: add a note for sparse and raw (device) files with the -l option

- backup: gave a warning about slow backups & restores for -Z5 and
  higher, but should have only been for -Z8 and higher.

#1371 Nov 28, 2015 - expires Jan 15, 2015


* rate keyword is supported on Backblaze B2 and WebDAV
* new WebDAV keyword: subdir
* better WebDAV documentation


- the rate keyword in dest.conf is now supported for Backblaze B2 and
  WebDAV destinations.  See doc/dest.conf.examples/README for more
  details about the rate keyword.  Basically it is an upload rate
  limit in bytes per second and allows suffixes like 512k to mean
  512K (512 x 1024 = 524288) bytes per second.

- "subdir" is a new keyword added to WebDAV destinations.  This allows
  storing multiple backups in the same WebDAV area.  It was possible
  to do this before if the subdirectories were created before using
  HB.  Using the subdir keyword, HB will create these directories.

- the example WebDAV file, doc/dest.conf.examples/dest.conf.dav, has
  more explanations about how to use HB with WebDAV.  WebDAV servers
  are often configured differently and can be picky about their setup.

#1365 Nov 26, 2015 - expires Jan 15, 2015


* backup bug fix: hang


- backup: a bug was introduced in the Nov 22 release, #1363, that
  caused backup to hang after creating only a few arc files.  This bug
  was not related to a particular destination type.

#1364 Nov 25, 2015 - expires Jan 15, 2015


* Glacier bug fix


- in #1363, a change was made that caused Glacier destinations to fail
  with this traceback:

  dest glac: [AttributeError] 'dict' object has no attribute 'type'

  Traceback (most recent call last):
    File "/", line 104, in <module>
    File "/", line 2047, in main
    File "/", line 302, in init
    File "/", line 248, in initdest
    File "/", line 177, in startdest
    File "/", line 229, in __init__
    File "/", line 123, in __init__
    File "/", line 90, in baseinit

#1363 Nov 22, 2015 - expires Jan 15, 2015


* add support for Backblaze B2
* enable SSL certificate verification for secure WebDAV
* ls and get permissions check bypassed for key.conf owner
* bug fixes


- Backblaze B2 is supported.  B2 is still in the invite-only beta
  stage, so please observe Backblaze beta guidelines.  See
  doc/dest.conf.examples/dest.conf.b2 for B2 dest.conf keywords.

- if the "secure" keyword is used with a WebDAV destination like, the SSL certificate received from the server is verified

- the HB ls and get commands will not check file permissions stored in
  the backup if the user running HB is the owner of key.conf.

  For example, if root does nightly backups and a user is given
  sufficient OS permissions to access the backup files, hb get checks
  permissions saved with backed-up files to see if the user has read
  access to the files being restored.  If not, hb get issues a "No
  read permission" error and will not do the restore.

  But in a disaster recover situation with shared hosting, userids may
  change, the userid used for backup and/or owning the HB backup files
  may not be the same as the userid doing the restores, and the person
  running HB may not have any control over userids in a shared hosting
  situation.  Now, if the userid running hb is the owner of key.conf,
  get and ls will proceed.

- dest clear: if there were files flagged for removal, dest clear
  could complain about them being the only copy and refuse to delete

#1340 Stable - Oct 22, 2015 - expires Jan 15, 2015


* backup skips empty directories on the command line
* bug fixes


- backup: backup now skips empty directories listed on the command
  line.  This is useful when backing up a mounted file system, eg NFS,
  that isn't currently mounted.  Previously backup would mark all
  files as deleted if the file system wasn't mounted.

- retain: -m used by itself caused a traceback.  Now, a message is
  displayed that the -x option is required if -s and -t are omitted.

- rm: if the highest backup version is removed with -rN, it could
  create an confusing situation for files that were previously backed
  up, deleted in rev N, then backed up again, for example:

    $ hb ls -c hb -a
    HashBackup build #1335 Copyright 2009-2015 HashBackup, LLC
    Backup directory: xxx
    Most recent backup version: 1
    Showing all versions
      0 /  (parent, partial)
      0 /Users  (parent, partial)
      0 /Users/jon
      0 /Users/jon/x  (deleted in version 1)
      1 /Users/jon/x

  Now when the highest version is removed, any files marked deleted in
  that version will be undeleted, as if the deleted backup never

- selftest: two new tests for the confusing rm situation above

#1335 Stable - Oct 18, 2015 - expires Jan 15, 2015


* release schedule change
* expiration date bumped to Jan 15th
* bug fixes


- to reflect its more stable status and to avoid a release update
  during December holidays, HashBackup's release schedule is changing.
  The new expiration schedule for the backup feature is:

     January 15th -- April 15th -- July 15th -- October 15th

- a user reported this traceback.  The cause is now fixed.  If you
  have this problem with a backup, delete the hb.sig file.

    Traceback (most recent call last):
      File "/", line 104, in <module>
      File "/", line 2191, in main
      File "/", line 371, in put
      File "/", line 833, in genincdb
    ZeroDivisionError: integer division or modulo by zero

- in unusual circumstances, HB may create a partial backup of a file,
  for example, when there is an I/O error backing it up, or when
  selftest has to truncate a file because it detects an error.  These
  partial files were not handled correctly by retain, sometimes
  causing good files to be removed while the partial file was
  retained.  Now, retain will delete partially backed up files if
  there is a later, complete backup of the file.

#1332 - Aug 11, 2015 - beta expires Dec 15, 2015


* expiration date bumped to Dec 15th

NOTE: Release #1330 has proven to be very stable over the summer.

The next release of HashBackup will have many changes.  Rather than
releasing these changes now, forcing everyone to accept all of them
before #1330 expires, the next major release is being held back until
after Sep 15th.

If you prefer to have a very stable version, update to #1332 before
Sep 15th.  The only change is a bump in the expiration date.  Update
after Sep 15th to get the latest features of the new release.

#1330 - May 19, 2015 - beta expires Sep 15, 2015


* expiration date bumped to Sep 15th
* get: fix progress percentage
* eof when answering yes/no questions
* cache-size-limit honored when syncing new destination
* fix unhelpful dest.conf error message
* imap destination supports timeouts


- get: during restore of a compressed file, the progress display did
  not go to 100%

- if an EOF occurs when hb asks a yes/no question, hb aborts instead
  of asking the question again.  This was a problem if a keyboard is
  unavailable, eg, hb running in the background.

- cache-size-limit (config keyword) is honored when syncing files to a
  new destination, to avoid filling the local backup directory.

- fixed unhelpful error message if an integer is expected for a
  keyword in dest.conf and something else is used

- imap destinations previously did not support the timeout keyword,
  and could hang for long periods of time.  Now, the default timeout
  is 5 minutes and can be changed with the timeout keyword in
  dest.conf.  If a timeout occurs, hb will use its normal retry loop
  to recover.

#1321 - Apr 20, 2015 - beta expires Jun 15, 2015


* selftest: logid traceback fix
* backup/selftest: hash datatype fix for #1316
* selftest: partial selftest fix


- selftest: a recent change verifies that hashes have the correct
  datatype in the hb database, but also could cause this error:

    Traceback (most recent call last):
      File "/", line 178, in <module>
      File "/", line 1166, in main
    UnboundLocalError: local variable 'logid' referenced before assignment

- backup: in #1316, variable-block dedup backups (-D without -B)
  created the wrong datatype for hashes, causing the previous error.
  This is now corrected, but you should run selftest -v2 --fix to
  check your backup.  It may display errors like this:

    Checking blocks I
    Error: block 256, hash is type str
    Corrected hash type
    Checked  256 blocks I

  Selftest -v2 (without --fix) may also display errors like this:

    Error: dedup blockid 3 hash mismatch (not critical)

  Running selftest -v2 --fix corrects these.

- selftest: a recent change enabled selftest to pack arc files, but a
  partial selftest could cause this traceback:

    Traceback (most recent call last):
      File "/", line 178, in <module>
      File "/", line 1198, in main
    ValueError: need more than 2 values to unpack

#1316 - Apr 12, 2015 - beta expires Jun 15, 2015


* April is HB test month
* selftest: -v4 --fix is now safe to use
* selftest: -v4 packs arc files
* selftest: suppress arc size 0 error
* selftest: don't require a 2nd -v2 pass
* backup: fix simulated backup error on empty backup
* get: fix hash mismatches on sparse files
* increase database timeout from 15 to 300 seconds
* get: fix "Unhandled exception" random error
* backup: large dedup table resize could fail


- for all of April, the focus will be on testing, test scripts, and
  bug fixes.  After April, two days per week will be devoted to
  testing to ensure HB's quality remains high.

- selftest: -v4 --fix could cause arc size errors if it was
  interrupted.  This is fixed and -v4 --fix is okay to use now.  If
  anyone had this problem, selftest --fix will now correct it.

- selftest: with -v4, if archives have to be downloaded to be checked,
  they will also be packed and uploaded if their free space is greater
  than pack-free-percent.  This happens even if pack-remote-archives
  is False, because during selftest, the arc files are already local.
  To completely eliminate packing (probably not a good idea) set
  pack-free-percent to 100.  Then empty arc files are deleted but
  none are packed.

- selftest: if destinations were not in sync, selftest might display
  this message that isn't actually an error:

    Error: arc.637.0 wrong size on atmos destination: should be 134223056, is 0

- selftest: if -v4 --fix encountered a bad block, deleted it, and the
  file containing it was the last file in an archive, selftest -v2
  would need to be run to correct this residual error:

    Error: arc.0.0 is not referenced
    Marked for deletion

  Now this is handled with one selftest.

- backup: with a simulated backup, if no backup data was generated
  because a zero-length file was saved or no files were modified, an
  error could occur after the backup when the empty arc file was

- get: sometimes failed with these errors, often with .dmg or other
  disk image / large files with repeated blocks.  It worked with -p0.

    Block hash mismatch, blockid 1727226: pathname
    File size mismatch, should be 17896386, is 14750658: pathname

- the database lock problem seems to be solved by increasing the
  database timeout.  The problem was random, but occurred more often
  on systems with:

  - a large buffer cache (lots of RAM)
  - disk I/O restrictions (VMs and shared hosts)
  - systems with lots of write activity
  - HB running under nice and/or ionice
  - XFS: it flushes every 30 seconds while ext4 flushes every 5, so
    more "dirty" buffers accumulate in the buffer cache with XFS

  These all contribute to long sync times.  HB databases use
  synchronous I/O for fault tolerance, and database commits require
  flushing the system buffer cache.  On some systems, this flush could
  take longer than 15 seconds - the previous database timeout.

- get: sometimes get would end with an error message even though the
  file restore was successful:

    Unhandled exception in thread started by <bound method dir.loop of <dirdest.dir instance at 0x100f26fc8>>
    Traceback (most recent call last):
      File "/", line 331, in loop

- backup: if the dedup table is >= 2GB, the next resize to 4+GB
  could fail.  On Linux, the error was:
    Error writing dedup table: wrote 2147479552 of 4147483640 bytes
  On OSX, the error was:
    OSError: [Errno 22] Invalid argument

#1303 - Mar 18, 2015 - beta expires Jun 15, 2015


* misc fixes


- stats: number of paths deleted was wrong

- backup: using similar directory names on the backup command line
  could cause errors:

    $ hb backup -c hb mnt/dir-2/db mnt/dir/backups
    HashBackup build #1298 Copyright 2009-2015 HashBackup, LLC
    Backup directory: /Users/jim/hb
    This is backup version: 0
    Dedup is not enabled
    Unable to stat file: No such file or directory: /Users/jim/mnt/dir/
    Unable to backup: No such file or directory: /Users/jim/mnt/dir/backups

#1298 - Mar 16, 2015 - beta expires Jun 15, 2015


* selftest: arc file healing bug
* clear deletes the log directory too


- selftest: arc file healing is where an arc file has a bad block and
  is reconstructed from good blocks found in another copy of the arc
  file.  This release corrects a bug found in internal testing.

#1293 - Mar 14, 2015 - beta expires Jun 15, 2015


* Python update
* S3/Glacier fixes
* default workers changed to 2 (was 3)
* clear preserves config by default, gets --reset option


- this version of HB is built with a later version of Python

- in #1256, a default timeout of 30 seconds was added for Amazon S3,
  because of long hangs at a (Linux) customer site.  But this change
  caused failures on Mac OSX:

    dest s3: error #1 of 3 in send arc.1.0: [error] [Errno 35] Resource
    temporarily unavailable

  The new version of Python fixes this problem.

- S3 and Glacier should work better in this release:
  - the Python upgrade fixes some communication problems
  - retries do not cause such long delays
  - both S3 and Glacier have 30-second I/O timeouts

- Glacier: in #1288, a software library HB uses could cause a
  traceback with large files:

    dest glacier: error #1 of 3 in send arc.1001.5:
    [UnicodeDecodeError] 'ascii' codec can't decode byte 0xd0 in
    position 21: ordinal not in range(128)

- the default number of workers for destinations is 2 instead of 3

- previously, the clear command reset all config keywords.  Now it
  preserves the config settings unless --reset is used.

#1288 - Mar 10, 2015 - beta expires Jun 15, 2015


* retain can operate on specific files & directories
* rm/retain: can pack simulated backups
* selftest: -v2 can be used with simulated backups
* fix hash should not be null selftest errors
* misc fixes


- retain previously operated on the entire backup.  Now, pathnames can
  be used with retain so that it operates on just these files or
  directories.  This is useful when you want to retain fewer copies of
  certain parts of the backup.

  For example, if you backup /home with several users, each user could
  have a different retention policy by running retain several times
  with /home/user1, then /home/user2, etc.

  As another example, you may have a log directory where you only want
  to keep the last 30 days, even though for the rest of the backup,
  you want to keep 1 year of backups.

  In addition to the specific retains, it's a good idea to continue
  running retain on the entire backup, without a pathname, unless you
  are sure that the specific retains will cover all parts of the

- rm/retain: a simulated backup (config keyword simulated-backup =
  True) is used to model a backup with live data, to learn by
  experiment the best backup method and HB options to use, without
  requiring a lot of disk space.  One key option is packing archive
  files.  Packing always happens with local archive files when the
  free space exceeds the pack-percent-free config keyword.  It happens
  with remote archives only if pack-remote-archives is True.

  If pack-remote-archives was set on a simulated backup, it caused:
    Error opening archive: Archive does not exist: /hbdev/hb/arc.0.0

  Since it is important to be able to model remote archive packing, to
  see how it affects backup space utilization, this is now supported
  with simulated backups.  When the config keyword
  pack-remote-archives is True (default is False), rm and retain will
  pack the simulated arc files and hb stats will show this.

- selftest: -v2 can now be used with simulated backups.  Any higher
  level will cause an error and stop.

- rm/retain: when hard links were removed, it sometimes caused a
  selftest error "hash should not be null" on other files that were
  hard-linked to the removed file.  This release fixes that bug.  To
  correct existing "hash should not be null" errors in your backup,
  run selftest -v2 --fix until there are no errors.

- selftest: when a hard-linked file is truncated, mark all related
  hard links as partial so they are saved on the next backup

#1284 - Mar 4, 2015 - beta expires Jun 15, 2015


* improve speed of fifo backups on Linux
* Glacier bug fix


- backup: piped backups (named pipes / fifos) are faster on Linux

- Glacier: a recent upgrade to a new version of the Amazon library
  supporting Glacier could cause a traceback:
    IOError: [Errno 2] No such file or directory: 'endpoints.json'

#1275 - Mar 2, 2015 - beta expires Jun 15, 2015


* selftest: correct "hash should not be null" errors
* log errors: display _RUN logs
* selftest: don't print progress if sending output to a file
* backup: backup from named pipes (fifo) on the command line
* Glacier: always tried last for retrieval


- selftest: there is a bug in HB's hard link handling that can cause
  "hash should not be null" errors.  The cause of the problem has not
  yet been fixed, but now selftest -v2 --fix will truncate the file to
  eliminate the errors.  Then backup will save the file again.  Still
  working on finding & fixing the hard link bug.

- log errors: logs ending in _RUN are now included in the error
  summary.  Sometimes if HB can't get started properly, empty _RUN
  logs are created.  Lots of these is a sign that "something bad" is

- selftest: in a recent change, selftest prints progress percentages
  in some long sections.  Don't print these if output is being sent
  to a file.

- backup: if a named pipe, aka fifo, is listed on the command line,
  the fifo is opened and all data is saved.  This can be used for
  example with a database dumper, to back up a database dump without
  having to create a huge dump file first.  Here is a simple example:

    mkfifo fi
    cat somefile > fi & hb backup -c backupdir fi

  Instead of using cat, any program can be used, and its output will
  be backed up.  It is not possible to put hb in a simple pipeline
  without a fifo, because it would not have a filename to associate
  with the data saved, so this does not work:
    cat somefile | hb backup -c backupdir      Nope!

  Backing up from named pipes is about half the speed of regular
  files, because pipes usually have small buffers.

  IMPORTANT: it is easy to have multiple processes writing to a fifo
  at the same time by mistake (I did it during testing.)  When that
  happens, the fifo is getting data from two places and the backup is
  a mixture of the two.  Or, said another way, it is trash.

- Glacier: when downloading a file, HB tried destinations in the order
  they are listed in dest.conf.  Now, Glacier is always a last resort
  even if it is listed first in dest.conf, to avoid 4-hour restore
  delays and potentially high retrieval charges.

#1269 - Feb 27, 2015 - beta expires Jun 1#1269 - Feb 27, 2015 - beta expires Jun 15, 2015


* selftest bug fix
* s3 hang fix
* more fun with Database is locked


- selftest: with non-incremental selftest, this traceback occurred:
    Traceback (most recent call last):
      File "/", line 178, in <module>
      File "/", line 1612, in main
    UnboundLocalError: local variable 'maxvernum' referenced before assignment

- s3 destination was causing hb to hang at the end because of a close
  added to help with the database lock problem

- yes, the dreaded (and random-ish) "Database is locked" error is
  still lurking.  Since it is not easily reproduced except at customer
  sites, here is another change that may help.  Or not.

#1266 - Feb 26, 2015 - beta expires Jun 15, 2015


* selftest: new --inc option (incremental selftest)
* default selftest level is -v2
* selftest: doesn't ask "Continue?" for -v5
* bug fixes


- selftest: a new option, --inc, can be used to do more thorough
  testing of a backup over a period of time.  For example, there are
  hb backups with over 30K arc files, none stored locally, so it's not
  reasonable to do a full selftest all at once.  Even with smaller
  backups, there may not be enough time in the backup window to do a
  full selftest.

  The --inc option is: --inc f/g, where f is the frequency selftest is
  typically run, and g is the goal for completing the selftest.  For
  example, --inc 1d/30d means selftest is run every day and a complete
  selftest should occur every 30 days.  Or, --inc 1h/1q means a
  selftest runs every hour, and a complete selftest should occur every
  quarter (90 days). 

  NOTE: --inc does not cause a selftest to be run on a schedule.  Use
  cron or some other scheduling tool to initiate selftest.

  --inc can be used with -v3, -v4, or -v5, and there is a separate
    checkpoint for each level.  This allows incremental local arc file
    tests (-v3), local + remote arc tests (-v4), or file hash
    verification (-v5) to have different schedules.

  The --inc option can be combined with -r to incrementally check a

  selftest --inc reset clears all checkpoints.

- selftest: previously, the default selftest level could be almost
  anything, depending on whether arc files are stored locally and
  other factors.  Now, the default level is -v2, or -v1 for simulated
  backups.  A note is displayed that higher levels will do more
  thorough checking.  Higher levels of checking may involve
  downloading lots of arc files, so it seems reasonable to request
  that rather than having it be a default action.

  For thorough backup checking, the recommended level for selftest is
  -v4, to check all arc file data.  -v5 can be useful, to verify user
  file sha256 hashes, but it does not test every arc file like -v4.

- selftest: in #1263, -rN -v4 failed with:

    Traceback (most recent call last):
      File "/", line 178, in <module>
      File "/", line 897, in main
    ValueError: need more than 0 values to unpack

#1263 - Feb 24, 2015 - beta expires Jun 15, 2015


* selftest can handle missing files
* bug fixes


- selftest: if an expected file was missing at a destination, selftest
  would fail or hang indefinitely waiting for it to be downloaded.
  Now, the missing file will generate errors for each block, but if
  there are other copies of the arc file at other destinations,
  selftest will upload a copy to the destination where it was missing.

- selftest: related to above, if the local copy of an arc file is
  missing, and it should be present because cache-size-limit is -1,
  selftest will leave a copy in the backup directory if there are
  destinations that have the file.  If an arc file is truly missing
  and not present locally or at any destination, all of its blocks are
  deleted and the files using those blocks are truncated when --fix is

- selftest: with the recent selftest changes, a race condition could
  cause this error:

    Checking blocks I
    Downloading arc.0.0
    Checking arc.0.0
    Exception in thread _writethread: [Errno 9] Bad file descriptor
      File "/", line 68, in start_thread
      File "/", line 65, in _writethread

- selftest: don't print 'Downloading arc.v.n' when there are no remote
  copies, and show the list of destinations when remote copies are

- selftest: add a note to run selftest -v2 --fix if there were

#1256 - Feb 23, 2015 - beta expires Jun 15, 2015


* a new command, log, runs another HB command with logging
  This command is experimental; feedback is welcome.
* selftest: levels -v3 & -v4 check remote arc files
* selftest: new -r (version) option
* selftest: progress display
* selftest: option to test specific arc files or pathnames
* selftest: healing arc files using multiple destinations
* debug keyword for Amazon S3, default S3 timeout
* bug fixes


- a new command, log, makes it easier to run HB commands with a log
  file.  Any HB command can be run with logging.  The usage is:

    # hb log backup -c backupdir -D1g -B1m /Users/jim /etc

  Example in a cron file:

    @hourly hb log backup -c xyz -D1g /; hb log retain -c xyz -s30d12m
    @weekly hb log errors -c xyz

  The hb log errors command displays log files from all commands that
  failed, zips all files to a monthly .zip file, and prints a summary
  of successful and failed commands.  This info is written to stderr
  and the exit code is always 1, so putting hb log errors in a cron
  file will cause the output to be emailed.

  When run from a regular terminal, command output is sent to the
  terminal and to the log file.  When run from a cron job, or anytime
  stdout is redirected, output is only sent to the log file.  Regular
  output (stdout) and error output (stderr) are interleaved, and every
  line is timestamped in the log file:

    2015-02-17 Tue 21:48:16| HashBackup build #1236 Copyright 2009-2015 HashBackup, LLC
    2015-02-17 Tue 21:48:16| Backup directory: /Users/jim/hb
    2015-02-17 Tue 21:48:16| Copied HB program to /Users/jim/hb/hb.1236
    2015-02-17 Tue 21:48:16| This is backup version: 0
    2015-02-17 Tue 21:48:16| Dedup is not enabled
    2015-02-17 Tue 21:48:16| /Users/jim/bcopy
    2015-02-17 Tue 21:48:16| Writing archive 0.0
    2015-02-17 Tue 21:48:16|
    2015-02-17 Tue 21:48:16| Time: 0.5s
    2015-02-17 Tue 21:48:16| Checked: 4 paths, 5542 bytes, 5.5 KB
    2015-02-17 Tue 21:48:16| Saved: 4 paths, 5542 bytes, 5.5 KB
    2015-02-17 Tue 21:48:16| Excluded: 0
    2015-02-17 Tue 21:48:16| Dupbytes: 0
    2015-02-17 Tue 21:48:16| Compression: 91%, 11.2:1
    2015-02-17 Tue 21:48:16| Space: 496 B, 139 KB total
    2015-02-17 Tue 21:48:16| No errors
    2015-02-17 Tue 21:48:16| Exit 0: Success

  Logs are written to the logs directory under the backup directory.
  The log filename is a timestamp and the command name.  While a
  command is running, the log filename will have RUN at the end.  If a
  command is suddenly aborted (kill -9), the RUN suffix will stay
  there.  If the command fails, RUN is changed to FAIL.  If the
  command succeeds, there is no status at the end.

- selftest has these -v options that have not changed:
    -v0: check database is readable
    -v1: check database integrity
    -v2: check database high-level structure
    -v5: check user file hashes (sha256)
    (No -v option means hb selects level based on several factors)

  The -v3 and -v4 options have changed:
    -v3: check all local archives
    -v4: check all local and all remote archives

  With -v3 or below, no data is downloaded from remote destinations.

  With -v4, arc files are downloaded from ALL destinations now,
  whereas previously, HB downloaded each arc file from only 1
  destination - usually the first destination listed - and then, only
  if necessary.  If you ran with cache-size-limit set to -1, selftest
  did not verify the remote arc files.

  The result is, -v4 is much more thorough than it used to be, and may
  take longer to run if you have multiple destinations.

  Another improvement is that -v4 uses local disk space equal to the
  number of destinations times one arc file size, so it can be run on
  huge backups that previously would have required a lot of local disk
  space.  A cache plan is no longer needed for -v4, but it is still
  used with -v5.

  IMPORTANT NOTE: selftest can't download and check arc files stored
  at Amazon Glacier.  The 4-hour retrieval delay is unmanageable for

- selftest has a new option, -r, to indicate the version you want to
  test.  Selftest always runs all of the usual database checks.  With
  -r, the -v3 and -v4 options only test the arc files in that version.
  For -v5, the -r option restricts full file verification to the user
  files in that version.

  The default is to test all versions, as before.  To test the most
  recent version, use -r-1.

- selftest has a progress display in the block & ref tests, since
  these can be quite long and may appear "stuck"

- selftest can test specific arc files or pathnames by listing them on
  the command line.  If arc files are listed, all copies are tested as
  if -v4 was used.  If pathnames are listed, their sha256 file hash is
  verified as if -v5 was used.  If any pathname is a directory, all
  files underneath the directory are tested also checked.  For listed
  pathnames, all versions are tested.  A warning is printed if
  specific arc names are combined with -v4, or pathnames are combined
  with -v5; neither of these make sense: -v4 tests *all* arc files and
  -v5 tests *all* pathnames.

- selftest: with -v4 and multiple destinations, arc files can be
  corrected.  All good blocks are merged into a new arc file that
  replaces any arc files with problems.  selftest does not yet handle
  completely missing remote files and will usually halt or hang.

- s3: if the debug keyword is added to an S3 destination, a log file
  destname.log is created in the backup directory.  Add debug 99 to
  dest.conf.  This will also usually cause HB to fail when exceptions
  occur rather than catching them and doing retries.

- s3: a 30-second timeout has been added to help prevent hangs on

- recover: failed with a traceback:
    Traceback (most recent call last):
      File "/", line 156, in <module>
      File "/", line 191, in main
      File "/", line 143, in init
    AttributeError: 'NoneType' object has no attribute 'get'
  This was related to adding simulated backups.

- in very unusual circumstances, this error could occur because an arc
  file was not closed:
    Error: unable to get block 716302: Error opening archive: Too many
    open files: (pathname)

- HB would randomly halt with a Database is locked problem.  This
  update includes a fix for one situation where this could occur.

#1235 - Feb 16, 2015 - beta expires Jun 15, 2015


* debug 99 keyword on a destination will display a detailed traceback
  if an error occurs during startup
* Glacier: fix typo in yesterday's release for this error:
    dest glac: error #1 of 3 in send arc.0.0: [NameError] global name 'location' is not defined

#1231 - Feb 15, 2015 - beta expires Jun 15, 2015


* config: new simulated-backup keyword
* backup: display file being saved on ctrl-c
* backup: save entire file if size increases
* selftest: --fix enhancements
* backup: permissions, filename case, Apples
* backup: new config keyword, backup-linux-attrs
* rm/retain: pre-2013 arc files are now packed
* rm/retain: display the amount of backup space removed
* rm/retain: respect cache-size-limit when packing arcs
* retain: add a default -x if none is specified
  NOTE: this may remove lots of deleted files from your backup that
        should have been removed earlier, but weren't
* export: include dest.db in export.tar
* pre-2014 database upgrade code removed
* bug fixes


- config: a new keyword, simulated-backup, can be set to True before
  the initial backup.  When set True, no arc files are created by the
  backup command.  This allows modeling backup options such as -B
  (blocks size) and -D (dedup table size), even for very large
  backups, without using a lot of disk space.  Simulated backups also
  run faster because there is less I/O.  Incremental backups work
  correctly, and the stats command can be used to view statistics,
  space used, etc.

  Summary of differences for simulated-backup:
  - must be set before the initial backup
  - cannot be changed after the initial backup
  - no arc files are created (not a real backup)
  - no files are sent to remote destinations
  - selftest is limited to -v1
  - get & mount will fail with "No archive" errors

- backup: if interrupted with a ctrl-c, backup will display the file
  it was saving.  This can be useful with -v0/-v1 since file names are
  not printed

- backup: if a file gets bigger during the backup, save whatever data
  is there.  Previously, backup saved the file up to its size at the
  start of the backup, plus sometimes 1 extra block.

- selftest: --fix will truncate files that contain a bad block
- backup: (Mac OSX) HB is a case-sensitive program, so /users is not
  the same as /Users.  Most Unix filesystems are also case-sensitive.
  HFS, the Mac OSX filesystem, is usually case-insensitive, so /users
  is the same as /Users.  If HB did nothing, then backing up /users
  and /Users would result in saving the same directory under 2
  different names.  It's not a problem, but it's confusing.

  To solve this, HB tries to figure out the correct "case" of all
  pathnames on the command line; then everything works correctly.  To
  do this, HB needs read access to the parent directory of all
  pathnames on the command line, so to map /users to /Users, HB needs
  read access to / (root).  Usually this is fine, but on some systems,
  HB may not have read access to a parent directory, and backup would
  fail with a traceback.  Now it will display a new error message
  "Cannot verify filename case", and use the filename as-is.

- backup: a new True/False config keyword, backup-linux-attrs,
  controls whether HB backs up Linux file attributes.  Linux file
  attributes are set with the chattr command and displayed with
  lsattr.  They are little used and poorly implemented on Linux,
  requiring an open file descriptor and an ioctl call.  This can cause
  permission problems, especially in shared hosting environments.  Now
  HB will only read and store file attributes on Linux if
  backup-linux-attrs is set True with the hb config command.  The
  default is False.

  NOTE: File attributes are not the same as extended attributes, also
  called xattrs.  Extended attributes are always backed up if present.
  Xattrs are handled by the attr, setfattr, and setfattr commands on
  Linux, as well as the Linux ACL commands.

- rm/retain: previously, HB would not pack version 0 or 1 arc files,
  created before Sep 10, 2012.  Now it will.  To see if your arc files
  need packing, check these lines in hb stats:
               48 GB archive space
               36 GB active archive bytes - 75.08%
  After packing old archives on this backup, hb stats says:
               39 GB archive space
               36 GB active archive bytes - 93.06%
  If cache-size-limit is -1 (the default), copies of arc files are
  kept locally and packing is enabled.  If cache-size-limit is set,
  not all arc files are kept locally; they are only downloaded, packed,
  and uploaded if the config keyword pack-remote-archives is True.

- rm/retain: when arc files are deleted or compressed, rm & retain
  display the amount of backup space saved.

- rm/retain: when cache-size-limit is set and pack-remote-archives is
  True, rm and retain now try to respect the cache size limit while
  compressing archives.  This keeps HB from using too much local disk
  space.  Packing might run slower if it has to wait for packed
  archives to be transmitted, to respect cache-size-limit.

- retain: without the -x option, files that had been deleted from the
  filesystem were staying in the backup forever.  This was fixed for
  -t (see below) but is still a problem for -s.  Now, a default -x is
  always used.  The default -x is the same as -t or the last time
  period of -s.  So for -s30d12m, the default would be -x12m, ie, keep
  history of deleted files for 12 months after they are deleted.

- export: the dest.db file and a stub dest.conf file with just
  destination names are now included in export.tar to help debug
  problems that involve remote destinations.  dest.db contains HB
  filenames (arc files, etc) and the names of destinations containing
  them.  It does not contain user data, keys, or passwords.

- retain: with -t retention, HB would not remove a deleted file if it
  was the only copy, no matter how old it was.  Now it is removed when
  it has been deleted longer than the -t retention time. 
  - a file is saved January 1st
  - the retention period is -t30: keep 30 days of file history
  - the file is deleted January 31st
  - for 30 days from Jan 31st, the deleted file must be restorable
  - 31 days after it was deleted from the filesystem, it can be
    removed from the backup

- retain: the -x option allows removing history for deleted files
  sooner than it would normally be removed.  For example, with -t30d,
  the history of a deleted file is kept for 30 days after it is
  deleted.  With -t30d -x15d, history of active files is kept for 30
  days, but files that have been deleted (from the filesystem) have
  their history removed 15 days earlier.  That was the design intent. 

  What actually happened (ie, the bug) is that retain with -x15d was
  removing deleted files *saved* more than 15 days ago, so it deleted
  files too soon.  It should have been checking the date the file was
  deleted, not the date it was saved.

- on Linux, when HB asked a yes/no question it sometimes would not
  display the question, yet waited for a response.  Related to a
  recent buffering change for stdout.

- on Glacier, HB was not ignoring "not found" errors when trying to
  remove files, causing these kinds of error messages:

    glac(glac): unable to remove arc.0.0 with archive id (long id)
    Expected 204, got (404, code=ResourceNotFoundException, message=The
    archive ID was not found: (long id)

  This can happen if part of a backup is saved in one region, the
  Location keyword (region) is changed in dest.conf without changing
  destname, then you backup in a different region.

- backup: when a backup has old, inactive destinations that were used
  in the past, HB would sometimes try to remove files from them.  This
  could cause errors like below with active Glacier destinations.  Now
  HB ignores inactive destinations.

    glac(glac): no archive id available to remove this file: arc.0.13

- selftest: don't display errors about files on inactive destinations

#1200 - Feb 3, 2015 - beta expires Jun 15, 2015


* update HB's database software


- #1200 is identical to #1199 except that HB's database software has
   been upgraded to a new release.  Everything should function the
   same, though performance may be slightly different.  In testing, a
   long selftest, which has lots of database operations, took about 7%
   less time than with the previous version.  If performance is much
   worse for any operation, please send an email.

#1199 - Feb 2, 2015 - beta expires Jun 15, 2015


* bump expiration date
* bug fix in rm


- rm: when a hard-linked file is removed, sometimes its data blocks
  were not being deleted.  This could cause a selftest error:

      Error: unknown logid referenced: 5 [r0]
      IndexError: list index out of range

  selftest --fix will correct this, but the actual cause was rm.

#1192 - Jan 16, 2015 - beta expires Mar 15, 2015


* automagically re-creating hb.db
* add hard-link count to ls -l display
* stats enhancements
* bug fixes


- when remote destinations are setup in dest.conf, HB creates
  incremental versions of hb.db (the main HB database) in the local
  backup directory.  These are sent to remote destinations and are
  used by hb recover to re-create hb.db if the local backup directory
  is lost.  In some cases, you may be working directly with a remote
  backup directory.  For example, the remote backup directory might be
  on an external USB drive and you bring it back to the office to do a
  complete restore when a disk dies.

  The "normal" way to do this would be to create a local recovery
  directory, add key.conf and dest.conf, and run recover.  This would
  download (or copy from the backup drive) all of the files from the
  remote backup directory and re-create hb.db.

  Now, you can put a key.conf file in the backup directory itself and
  the next HB command will re-create hb.db from the hb.db.N files.  It
  saves a copy step and allows you to copy the HB program and key.conf
  to a remote destination and run selftest, for example.  But be aware
  that you are operating directly on the backup and any modifications
  will likely corrupt it.

- hb ls -l displays the link count, like regular ls -l, right after
  the file mode.

- stats: the display precision on a few statistics was increased for
  multi-TB backups.  Some new statistics were added to hb stats, for
  example, an estimation of the backup space saved by dedup.

- backup: if a directory /abc/def exists and the path /abc/def-ghi
  (either a file or directory) is excluded in inex.conf, backup would
  usually save /abc/def-ghi anyway.  This bug could occur with many
  punctuation characters other than dash.

- audit: if a program was still running, audit would show Finished: as
  the current time.  Now it displays a blank space.

- stats: if a backup was in progress, stats would sometimes display
  an error

- selftest: it's possible for a file's size to change while it is
  being backed up.  Usually the file is growing, but it can also be
  truncated.  If a file is truncated to zero bytes between the time
  that HB reads the file size and begins the file backup, selftest was
  incorrectly reporting this error:

    Error: logid 39528488 does not reference blocks: (pathname)

#1180 - Jan 11, 2015 - beta expires Mar 15, 2015


* changes to copying HB executable
* an error backing up file system flags is not fatal
* selftest enhancements
* bug fixes


- every version of the HB program used in a backup is now copied to
  the local backup directory, named hb.N, where N is the build number
  (not to be confused with hb.db.N files, which are database-related).
  This local copy is always created now, whereas before it was only
  created if remote destinations were configured.  If the
  copy-executable config keyword is True, each version is also copied
  to remote destinations.  You can delete old copies of hb.N manually,
  and they will also be deleted from remotes.

  IMPORTANT: do not delete hb.db.N files by mistake!
- backup: if file system flags cannot be read for a file or directory,
  backup displays an error message as before, but continues as if the
  flags were zero rather than giving up on the file or directory.
  File system flags are set with the chattr (Linux) and chflags
  (BSD/OSX) commands and are not widely used.  Related to this change,
  backup unnecessarily opened directories for reading on BSD/OSX
  systems, which could cause errors if there was only x access to a
  parent directory.

- selftest: more new tests - yay!

- selftest: if cache-size-limit is set, selftest -v3 or higher is
  used, and you answer No to the Continue? question, the arc cache was
  not cleaned up (lots of extra arc files left there).

- stats: bombed with a traceback if the initial backup was still
  running, but had done at least 1 commit.  Now it doesn't bomb, but
  the statistics are still not quite accurate because stats uses
  completed backups for some of its numbers, but all backups for other

- the first backup with this new version may upload many archives.
  The sync code was not always uploading archives that were already
  present on the destination if the size changed because of packing
  following rm/retain in older versions of HB.  This mainly occurred
  if packing was interrupted or a destination was down during packing.
  One of the new tests in selftest revealed this.

- audit: if auditing is enabled and an HB command is running, audit
  would display the current date & time for the running HB process
  under "Finished".  Now it displays a blank.

#1164 - Jan 4, 2015 - beta expires Mar 15, 2015


* get/selftest: cache planning is 3-5x faster
* --logid option removed from selftest
* selftest --fix improved
* hb sha256 should not require -c


- get/selftest: creating a plan to manage the local arc cache (when
  cache-size-limit is set) is 3-5x faster

- prior to #1090 (June 2014), an unusual race condition during backup
  could cause this selftest error:

    Error: for logid 3241850, pathid 2923104 is invalid.

  This could also cause selftest to abort, with an error message to
  run selftest.  Doh!  Now, selftest --fix corrects these errors by
  manufacturing a pathname based on the prior pathname backed up.  For
  example, if /data/file1 was the file backed up before the missing
  path, the missing path would be called /data/file1#path-N#hberror,
  where N is a unique number.

- the hb command 'sha256' should not require a -c option.

#1159 - Dec 26, 2014 - beta expires Mar 15, 2015


* cache planning bug fix


- get/selftest: if cache-size-limit is set, archives are being packed,
  and rm/retain are used, get and selftest (with -v3 or higher) could
  fail with an error like this:

    Planning cache...
    Traceback (most recent call last):
      File "/", line 172, in <module>
      File "/", line 857, in main
      File "/", line 298, in plan
      File "/", line 169, in iterprefetch
      File "/", line 50, in getvernum
    Exception: Blockid not found: 919621

  This bug was introduced in #1107, when packed archives got new names
  instead of keeping the old name.

#1157 - Dec 22, 2014 - beta expires Mar 15, 2015


* new export command
* selftest improvements
* init/rekey fix for Synology NAS


- a new command, export, creates a tar file of the HB database for
  customer support.  This contains file metadata such as filenames,
  info listed by ls -l, and HB metadata, but no user file contents.

  The advantage of export is:
  - it rekeys the database -- your backup key is not disclosed
  - a passphrase can be set, so if your export is intercepted,
    even with the key, the passphrase is still required
  - it clears backup keys stored in the database
  - it clears destination info if stored in the database
  - export's tar file is much smaller than tar alone can create

- selftest: more tests added

- hb init sets the key file, key.conf, to be read-only to prevent
  accidental deletion.  If the backup directory is actually on a
  Synology NAS mounted over AFP (AppleTalk), an error occurs because a
  read-only file can't be renamed in this setup.  HB was changed to
  reset the permissions before the rename.
  NOTE: rekey still does not work in this configuration.  A better way
  to setup a NAS is to use a local backup directory with -c, setup a
  Dir destination to the NAS in dest.conf, and set cache-size-limit
  with hb config to avoid having a local copy of the backup.

#1146 - Dec 15, 2014 - beta expires Mar 15, 2015


* bump expiration date (this time for real!)
* selftest improvements
* redirecting HB output to a file works better
* backup, ZFS, ACLs
* rm/retain: bug fix
* dedup table >= 2GB bug fix


- backup: apparently the backup expiration date did not get bumped
  back in October, though it says it was in the change log.  Apologies
  to everyone for such a silly mistake.

- selftest code has been reorganized and streamlined in this version,
  to better accomodate all the new tests

- when HB output is redirected to files, for example:

    hb backup -c xxx / 2>&1 >hb.log

  normal output was buffered differently than error output, so the log
  file didn't look right: error output and regular output weren't
  interleaved correctly.  This also caused weirdness if output was
  sent to syslog, because output was dumped all at once at the end of
  the program.  Now log files should look like terminal sessions.
  NOTE: this may be a bit slower if a lot of output is sent to a
  file on a remote file system.

- backup: HB does not support ACLs on ZFS, NFS4, and Windows, and
  issued 2 error messages:

    Unable to read ACL: Invalid argument: /
    Unable to read ACLs on this filesystem (zfs/nfsv4/Windows?)

  This caused the error count to always be at least 2, which made
  backup monitoring more difficult.  Now, backup will display only one

    Unable to read ACLs on this filesystem (zfs/nfsv4/Windows?): /

  and will not bump the error count on this error.

- retain with -x and -v would sometimes stop with an error:
    No pathname for pathid N; run selftest
  Selftest would complete without errors.  The bug was that retain had
  deleted a pathname because of -x, then wanted to display the
  pathname because of -v.  The simple fix was to display the pathname
  first, then delete it.

- backup: dedup tables >= 2GB could cause problems when written to
  disk, with a message (Linux) like:

    Error writing dedup table: wrote 2147479552 of 2147483640 bytes

  or on OSX, an Invalid argument error.  Now it works on both.

#1133 - Nov 24, 2014 - beta expires Mar 15, 2015


* IMPORTANT: backup option -n is obsolete
* IMPORTANT: run selftest -v2
* selftest improvements
* rm/retain: bug fix


- backup: on July 29th, 2009, the -n option was added to HB backup so
  that if retain was running next, the entire HB database was not sent
  twice: once from backup, then again from retain.  For a long time
  now, HB has been sending much smaller incremental database updates
  instead of a complete dump, so the justification for -n is gone.  If
  -n is used with backup, and retain decides not to delete any files,
  the database does not get sent at all - clearly not the intent.  In
  this rev, -n issues a warning and is ignored.  Early in 2015, the -n
  option will be removed altogether and backups will fail if it is
  still used, because it can be a dangerous option.

- selftest: a few more tests have been added for -v2, for very large
  backups.  Selftest also had a bug where it would not check files
  that had extended attributes + ACL in common with more than 64K
  other files.

  IMPORTANT: all customers should run selftest -v2 to check their

- when packing arc files, rm/retain would sometimes do this if
  cache-size-limit is set:

    Getting arc.32.0
    Packing arc.32.0 as arc.32.1
    Getting arc.34.0
    Packing arc.34.0 as arc.34.1
    Getting arc.34.0
    dest rsync: stopping because of errors
    dest rsync: Traceback (most recent call last):
      File "/", line 310, in loop
      File "/", line 469, in getcmd
      File "/", line 443, in getparts
      File "/", line 67, in retry
      File "/", line 116, in getfile

    Unable to download archive arc.34.0: Exception('destinations halted',)

    Traceback (most recent call last):
      File "/", line 146, in <module>
      File "/", line 322, in main
      File "/", line 181, in finish
      File "/", line 92, in compress
      File "/", line 156, in __init__
      File "/", line 189, in _open
      File "/", line 713, in fetch
    NoArchive: Archive does not exist:

  This was a database bug.  A workaround was added to HB.

- rm/retain: with small VMs, rm/retain might have "out of memory"
  errors during startup with rsync destinations, because HB is unable
  to fork rsync while the dedup table is loaded.  It is recommended to
  use workers 1 in dest.conf in low memory environments with rsync

#1125 - Oct 29, 2014 - beta expires Mar 15, 2015


* bump expiration date
* new config keyword "remote-update"
* create hb.db.n in dest sync command
* sort filenames better in dest ls command
* rm & retain also sync files, like backup
* new selftest -v2 checking
* bug fixes / minor changes
* updated docs


- when packing arc files because of a rm or retain, HB stores new,
  packed archives on remote destinations, then after all new archives
  are stored, removes the obsolete archives.  This order is necessary
  to preserve the integrity of the remote backup, but it also uses
  more remote disk space.  For users who are tight on remote disk
  space or have filled a remote, it may be necessary to remove old
  data first, then add new data.  The new config keyword
  "remote-update" can be set to either "normal" (the default), or
  "unsafe" to control the order of operations.

  IMPORTANT: if you set remote-update to unsafe and an operation is
  interrupted, the remote backup area may be in a temporary "bad"
  state, and doing a recover will fail.  The local backup directory is
  fine, and the next complete HB run should correct the remote backup

- if HB has been only creating a local backup, then a dest.conf is
  created, then hb dest -c xxx sync is done, hb.db.0 was not sent to
  the remote because it didn't exist.  The backup command normally
  creates these, but did not since there was no dest.conf.  The
  remote backup is not usable without hb.db.N files.  An interrupted
  backup would cause a somewhat similar situation.  Now, hb dest sync
  will create a new hb.db.N and send it to all remotes.

- the hb dest -c xxx ls command sorted files by destination and
  filename.  But filenames like arc.10.0 would sort before arc.2.0,
  which gets very confusing.  Now filenames are sorted as expected:
  all arc.0.N files, then arc.1.N, arc.2.N, ..., arc.10.N, etc.

- rm/retain: if a disk full occurred on a remote while packing
  archives, the next HB command could get stuck on "Getting arc.X.Y"
  (the old archive that was packed) with an error that it didn't
  exist.  This bug was introduced in the Sep 10th release.  Running
  selftest -v2 --fix will correct this.

- rm/retain: previously rm & retain did not do a local->remote sync
  like backup does when it starts.  Now they do a sync when finished.

- selftest: a few new tests were added for -v2 and above.  -v2 is an
  important selftest level because it can be used on very large
  backups without having to download archives if they are not local.
  One new test makes sure that all arc files are either local or on at
  least 1 remote destination.  Note that HB does not know whether
  files are really on remotes, but only that it successfully sent them
  in the past and they should still be there.

- backup: previously, the backup command would remove partial or
  uncommitted archive files when it was interrupted, for example, with
  a ctrl-c.  This can cause race conditions and destination errors,
  because the files being removed may already be queued or open for
  transmit.  Now, backup will just exit, and extra archive files will
  be removed at the end of the next backup, rm, or retain.

- display a message when creating hb.db.N files.  It can take some
  time to create these files if there is a lot of backup history, even
  if only a single file is backed up, and it's not obvious what is
  happening during the delay.

- when HB syncs destinations, it does a better job of removing .tmp
  files from remotes (affects dir, ftp, and ssh destinations)

- when a specific destination is cleared with hb dest -c xxx clear
  ddd, HB checked to make sure it was not deleting the only copy of a
  file, but it would delete files if they were on other destinations.
  Now, HB will not delete any files from a destination if it contains
  the only copy of any file.

- security doc updated: dest.db sent directly to remotes

#1107 - Sep 10, 2014 - beta expires Dec 15, 2014


* updated expiration date
* webdav: add digest authorization
* webdav: enhance error recovery
* rm/retain: fix "out of memory" errors in small VMs
* rm/retain: eliminate bad transient remote states if interrupted
* dest.db copied to remotes directly


- webdav: added digest authorization.  Currently HB uses Basic
  authentication first, then Digest authentication if Basic fails.
  This is not ideal, because it defeats the "security" enhancements of
  Digest authentication.  If security is a concern, use the secure
  keyword to enable ssl.

- webdav: some dav servers only allow use of creditials for a certain
  amount of time before throwing an error.  HB will now recover from
  these credential timeout errors, though it will still show up as a
  retry.  Ideally, these should be handle more gracefully by HB so
  they don't look like errors.

- rm/retain: release dedup table early to help prevent running out of
  virtual memory in small VM environments with multiple rsync workers

- rm/retain: "packing" is the recovery of deleted space within archive
  files and is performed automatically based on config settings during
  rm and retain.  Previously, HB would overwrite arc files during
  packing and then send them to remote destinations.  But between
  sending a packed archive and sending hb.db.n, the remote backup was
  in a transient, broken state.  If rm or retain was interrupted
  during this time, the remote backup would be broken (ie, a recover
  would fail) until the next backup, rm, or retain automatically
  corrected it.

  Now, packed archive files are created with new names and the old,
  unpacked archives are removed afterwards.  This prevents the bad
  transient state if HB is interrupted.  The downsides of this new
  method are that more space will be temporarily required on the
  remote side, equal to the size of all packed archives, and rsync
  will no longer be able to use its fast "delta transmission" to
  upload a packed archive.

  There were a couple of other cases of potentially bad transient
  states on remote destinations that have also been corrected.  An
  interrupted recrypt still leaves the remote backup in a bad state
  until the recrypt is finished, but this is documented in recrypt.

- previously, dest.db was compressed & signed before sending to
  remotes.  Since dest.db is always encrypted, there was little
  security benefit to this and it was confusing that a remote dest.db
  file was very different from a local dest.db file with the same
  name.  Now, dest.db is copied without modification and is directly
  usable as a dest.db file if you have the correct key.

  NOTE: the local and remote dest.db may not match if HB removes files
  after dest.db is copied.  This is normal and expected.

#1100 - Jun 29, 2014 - beta expires Sep 15, 2014


* add WebDAV support


- WebDAV remote destinations are now supported.  The destination type
  is either dav or webdav.  There is a new doc file in
  doc/dest.conf.examples to explain the options for DAV destinations.

#1090 - Jun 9, 2014 - beta expires Sep 15, 2014


* bump expiration date
* new recover option, -n, bypasses arc file downloads
* new symlink keyword for Dir destinations
* bug fixes


- recover: a new option, -n, makes restoring a few files faster in a
  disaster recovery situation.  If the local backup directory is lost,
  recover is used to download files from a remote destination.  But
  recover downloaded all archive files, unless arc-cache-limit was
  set.  For large backups, this could take a very long time.  Now, the
  -n option can be used, and no archives will be downloaded, the get
  command can be used to restore the required files, and only the
  archives needed will be downloaded.  Later, recover can be used
  again without -n to recover all archive files.

- Dir destination: previously, HB would try to do a symlink to "fetch"
  files from a remote Dir destination.  This is useful when
  cache-size-limit is set and Dir destinations are actually remote
  drives, like Google Drive, Dropbox, etc., because instead of
  downloading the whole arc file from the remote service, HB can fetch
  only the blocks it needs for a get command.  But it can also be
  confusing if you don't realize what is happening.  So a new symlink
  keyword has been added.  If symlink is not present or is False, no
  symlinks will be used.  If symlink is present with no value or is
  True, symlink will be attempted.  If symlink fails, HB falls back to
  a regular copy.

- recover: dest.db is always downloaded, even if it already exists in
  the recovery directory.  A customer reported problems with recover,
  but the real problem was that recover had been run earlier, a backup
  was run (deleting files from the remote directory), and recover was
  run a 2nd time in the recovery directory using a stale dest.db.
  This caused a hang problem when non-existant hb.db.* files were

- recover: hb.db is always rebuilt, even if it already exists in the
  recover directory.

- recover: hb.db is rebuilt from hb.db.N files.  If recover is
  interrupted and restarted, it now will not download hb.db.N files
  that already exist and are the correct size.

- recover: previously, recover would rename existing files with a .old
  suffix.  Now, existing files that are going to be overwritten are
  deleted instead.

- recover: customer reported hash mismatches after a recover.  When
  recovering into a directory already containing arc files, verify that
  any existing arc files are the correct size, and if not, download the
  file again.

- customer reported selftest error (-v2 or higher):

    Error: for logid 3241850, pathid 2923104 is invalid

  The customer kept daily backup logs to help solve the problem.  A
  race condition triggers the bug:

    1. backup a directory d containing file x
    2. delete file d/x
    3. backup directory d again
    4. create file d/x again
    5. backup directory d again:
       5a. hb reads the directory to get a list of files
       5b. file d/x is deleted by another process
       5c. hb tries to backup file d/x and gets a stat error

  Afterwards, selftest will throw the error about an invalid path.
  This backup bug is now fixed.

- related to above error, if the same thing happens with a nested
  directory x instead of a file, it caused this selftest error:

    Traceback (most recent call last):
      File "/", line 151, in <module>
      File "/", line 619, in main
      File "/", line 251, in showpath
      File "/", line 112, in getpathname
    Exception: No pathname for pathid 5; run selftest

- Amazon Glacier: document that rate limiting is not supported

#1083 - Mar 12, 2014 - beta expires Jun 15, 2014


* update expiration date

#1079 - Dec 23, 2013 - beta expires Mar 15, 2014


* S3 example dest.conf update


- a note was added to the S3 example dest.conf for Google Storage,
  explaining that Developer Keys have to be generated to use the
  S3-compatible API with Google Storage.

#1078 - Dec 5, 2013 - beta expires Mar 15, 2014


* security doc update
* extend backup expiration date to March 15, 2014


- the security document was updated to mention loading dest.conf into
  hb.db to avoid having plaintext passwords in dest.conf, and describe
  modifications to the key generation procedure when the entropy pool
  is exhausted (Linux only).

#1075 - Oct 21, 2013 - beta expires Dec 15, 2013


* bug fixes


- customer reported error:
  -- rebooted during a backup
  -- this leaves hb.db-journal file (transaction log)
  -- run hb upgrade to get latest version of hb
  -- this version required a database upgrade
  -- database upgrade would not work because journal existed

     $ hb versions -c hb
     HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC

     Current database rev: 13
     Upgrading database to rev: 14
     Warning: unable to audit command: Can't upgrade database with a
     transaction active
     Backup directory: /Users/jim/hbdev-1035/hb

     Current database rev: 13
     Upgrading database to rev: 14

     Traceback (most recent call last):
       File "/", line 171, in <module>
       File "/", line 139, in main
       File "/", line 138, in opendb
       File "/", line 379, in upgradedb
     Exception: Can't upgrade database with a transaction active

  HB was changed to clean up the journal before a database upgrade.

- recover: with Glacier destinations, if a recover is in progress, the
  recover is aborted, then it is restarted, HB tries to display the
  archives already in progress and when they started retrieval.  This
  print message caused an error on Linux, either an import error for
  _strptime, or a seg fault.

#1071 - Aug 10, 2013 - beta expires Dec 15, 2013


* better messages
* bug fixes


- the help for init has been improved

#1070 - Aug 5, 2013 - beta expires Dec 15, 2013


* better messages
* bug fixes


- mount displays a better error message when a mountpoint is already
  in use, a better message when the backup has been mounted, and
  explains how to abort the mount using Ctrl-\

- stats command was printing:
    4:1 reduction ratio of backed up files for last %d backups
  instead of:
    4:1 reduction ratio of backed up files for last 5 backups

- beginning with #1032, a fatal exception was raised when a
  destination had trouble starting, for example, a Dir destination was
  unavailable because a removable USB disk wasn't inserted.  This
  fatal error was not intended, and the effect is that you couldn't
  have destinations that were temporarily missing.

- in #1035, a feature was added to detect overwriting a remote backup
  area by accident.  But when doing an initial backup, this caused
  error messages to be displayed, usually 3, because HB kept trying to
  download the DESTID file when it did not (and should not) exist.
  The feature still works, but now the confusing error messages are
  not displayed.

#1062 - Jul 20, 2013 - beta expires Dec 15, 2013


* expiration date updated
* bug fixes


- backup: in #1059, saving / and /mnt did not save /mnt if it was a
  separate filesystem and -X wasn't used.  /mnt should have been saved
  because it was specifically mentioned on the command line.  A
  similar thing could happen with excluded files that were on the
  command line.

#1059 - Jul 10, 2013 - beta expires Sep 15, 2013


* database upgrade to dbrev 14 (may take a while)
* stats command runs faster on huge backups
* new backup statistic: # of files checked
* better error messages
* bug fixes


- NOTE: this rev will do an automatic database upgrade to dbrev 14
  when any HB command is used.  Extra statistics are maintained to
  speed up the stats command so it scales better for huge backups with
  millions of files.  The existing database does have to be scanned
  during the upgrade to initialize these new statistics, so the
  upgrade could take some time to complete - about the same time as
  the old stats command took to run once

- hb stats runs faster for huge backups and the "industry dedup ratio"
  will be more accurate for new backups

- backup: prints the number of files and bytes checked in addition to
  the number actually saved (because they were modified).  These
  numbers now include saved directories.  When /abc/def is the only
  file backed up, what actually gets saved is /, /abc, and /abc/def.
  Backup will say 3 paths were checked and saved whereas before it
  said 1 file was saved.

- mount: if an empty backup directory is mounted, it caused a
  traceback.  Now, an error message 'No backups yet!' is displayed.

- hb: if an invalid command is used, like hb xyz, hb could complain
  that the backup directory doesn't exist, when the real error is that
  the command is not recognized

- the error message displayed when the backup directory doesn't exist
  is more specific, advising to use the -c option if it wasn't used,
  or to use a different directory with the -c option.  It was
  confusing when the -c option was omitted by accident.

- when the backup database is newer than the hb program can handle, hb
  no longer recommends using the clear command.  Auditing and command
  restrictions (disable-commands config option) prevent using clear
  in this situation.

- backup: if a symlink to a mounted block device was used on the
  command line, backup would not check to see if the block device was
  mounted and display the appropriate warning.

- help command could cause a traceback:

    HashBackup build 1037 Copyright 2009-2013 HashBackup, LLC
    Traceback (most recent call last):
      File "/", line 51, in <module>
      File "/", line 556, in confdir
    misc.err: Backup directory doesn't exist, use hb init command: /root/hashbackup

- if a database upgrade fails with an error, the original database is
  re-installed.  But if the upgrade was aborted with Ctrl-c, the
  original database was not re-installed.

- in #1035, a new feature was added to prevent sending two backups to
  the same destination.  But if a destination is flaky (imap in this
  case), hb could report that the destination ID's did not match and
  you may be overwriting another backup, when the real problem is that
  the remote service had an issue and did not return the DESTID file.
  Error retries have been added to fix this.

- when there is a destination ID mismatch, the local and remote IDs
  are displayed to help determine the problem

- compare: could cause a traceback on ZFS, because ZFS ACls are not
  yet supported.  Now it displays a warning message like backup does.

- with Amazon Glacier, HB creates an associated bucket in S3.  But the
  location names for S3 and Glacier are sometimes not identical, and a
  traceback could occur:

    Traceback (most recent call last):
      File "/", line 76, in <module>
      File "/", line 1973, in main
      File "/", line 180, in init
      File "/", line 126, in initdest
      File "/", line 62, in startdest
      File "/", line 172, in init1
      File "/", line 185, in init1
      File "/boto/s3/", line 500, in create_bucket
    S3ResponseError: S3ResponseError: 400 Bad Request
    <?xml version="1.0" encoding="UTF-8"?>
    <Message>The specified location-constraint is not valid</Message>

  Now HB maps Glacier locations to corresponding S3 locations.

#1037 - May 30, 2013 - beta expires Sep 15, 2013

- when dest clear is used to delete files from a destination, delete
  the DESTID file too

- a bug fix in #1035 caused the upgrade command to fail with a traceback:

    # hb upgrade
    HashBackup build 1035 Copyright 2009-2013 HashBackup, LLC
    Traceback (most recent call last):
      File "/", line 51, in <module>
      File "/", line 556, in confdir
    misc.err: Backup directory doesn't exist, use hb init command: /root/hashbackup

  The upgrade command cannot be audited since it is not associated
  with a backup directory.  To work around this problem, create a
  ~/hashbackup backup directory if you don't already have one, as in
  this example, first showing a failed upgrade, then success:

    [jim@mb ~]$ hb upgrade
    HashBackup build 1035 Copyright 2009-2013 HashBackup, LLC
    Traceback (most recent call last):
      File "/", line 51, in <module>
      File "/", line 556, in confdir
    misc.err: Backup directory doesn't exist, use hb init command: /Users/jim/hashbackup

    [jim@mb ~]$ hb init -c ~/hashbackup
    HashBackup build 1035 Copyright 2009-2013 HashBackup, LLC
    Backup directory: /Users/jim/hashbackup
    Permissions set for owner access only
    Created key file /Users/jim/hashbackup/key.conf
    Key file set to read-only
    Setting include/exclude defaults: /Users/jim/hashbackup/inex.conf

    VERY IMPORTANT: your backup is encrypted and can only be accessed with
    the encryption key, stored in the file:
    You MUST make copies of this file and store them in a secure location,
    separate from your computer and backup data.  If your hard drive fails,
    you will need this key to restore your files.  If you setup any
    remote destinations in dest.conf, that file should be copied too.

    Backup directory initialized

    [jim@mb ~]$ hb upgrade
    HashBackup build 1035 Copyright 2009-2013 HashBackup, LLC
    You already have the latest version

#1035 - May 17, 2013 - beta expires Sep 15, 2013


* disable destinations to prevent overwriting an unrelated backup
* new dest setid command to "marry" a destination to a backup
* bug fix: audit not working if -c wasn't used on command line


- a new file, DESTID, is stored on each destination to prevent
  accidentally overwriting a remote backup area.  If this file does
  not match the backup, an error message is displayed and the
  destination is disabled (for this run of HB):

    dest s3: destination ID mismatch - you may be overwriting another
    backup! Verify destination is correct; use hb dest setid s3 to
    disable this warning.

  This error will not occur during normal operation, so if you see it,
  pay close attention to make sure you aren't overwriting a backup.
  The error will occur for example if:

  -- you do a backup with dest.conf, delete the local backup
     directory, and do another backup using the same dest.conf

  -- you configure 2 destinations with different names, but they both
     point to the same remote storage area

  -- you configure 2 different backups, maybe on different machines,
     to point to the same remote storage area

  HB may be slightly slower to start because the remote DESTID file
  has to be checked for every destination.  An error message might be
  displayed on your first backup since DESTID will not exist yet.

- a new dest subcommand, setid, sets DESTID on a remote destination(s)
  to match the current backup.  Before doing this, make sure you are
  not overwriting an active backup.

- audit was not working when -c wasn't used, for example, when the
  HASHBACKUP_DIR environment (shell) variable was set or one of the
  default backup directories /var/hashbackup or ~/hashbackup was used.

- dest: display unrecognized subcommand in error message

#1027 - May 10, 2013 - beta expires Sep 15, 2013


* get merges restores into existing directories rather than replacing them
* new get option --delete deletes existing files not restored (like before)
* bug fix: dest clear


- get: a new option, --delete, deletes existing files in restored
  directories that are not in the backup.  This is similar to rsync's
  --delete option, allowing HB to "sync" a directory like rsync rather
  than just add to it.  With -v2 or higher (the default is -v2), the
  names of the deleted files and directories are printed.

- get: previously, get restored into a temporary file or directory,
  deleted the original file if it existed, and renamed the temp file.
  When restoring directories, often it makes more sense to merge the
  restored directory with an existing directory (that's what tar
  does).  Especially with ZFS, when restoring BSD jails, a user's home
  directory might contain several different filesystems.

  Now, get restores directly to the target file or directory.  If the
  target already exists and is a directory, get will merge the
  restored contents with the existing contents, overwriting any
  existing files.

- dest: clear command could cause a traceback:
    Traceback (most recent call last):
      File "/", line 99, in <module>
      File "/", line 162, in main
    TypeError: cannot concatenate 'str' and 'int' objects

#1020 - May 8, 2013 - beta expires Sep 15, 2013


* dest erase command is replaced by new dest unload command
* new dest clear command removes files from a destination
* new dest sync command syncs local backup to all remotes


- the dest subcommand erase has been eliminated.  To move dest.conf
  from the database to a text file, use the new unload subcommand (see
  next change)

- a new dest subcommand, unload, writes the dest.conf stored in the
  database to a text file and removes it from the database.  If no
  pathname is given, the file is written to dest.conf in the backup
  directory.  If the output file exists, HB prompts whether it's okay
  to remove it.  dest load and dest unload are now opposites.

- a new dest subcommand, clear, removes all files from destinations
  specified, usually because you are no longer using that destination
  and are going to remove it from dest.conf after deleting files
  stored there.  This command will not remove a file if it is the only
  copy available, ie, there is no local copy and no other remote copy.

- a new dest subcommand, sync, ensures that all remote destinations
  are in sync with the local backup directory.  This always happens at
  the start of each backup, but this dest command can force it at
  other times.  Previously, backing up a small or dummy file like
  /dev/null would force a sync.

- a new dest subcommand, ls, displays a listing of all files stored at
  each destination

#1019 - May 2, 2013 - beta expires Sep 15, 2013


* recover bug fix


- a change in #1015 was for destination threads to shutdown before the
  main program.  The purpose of this was to avoid a race condition
  where a thread wokeup while the main program was dying.  This is a
  hard condition to duplicate.  The symptom is that when the main
  program is exiting, something like this is displayed:

    No errors
    Unhandled exception in thread started by
    Error in sys.excepthook:

    Original exception was:
    Unhandled exception in thread started by
    Error in sys.excepthook:

    Original exception was:

  Since this just happened to me with #1018, the fix in #1015 didn't
  work.  And, the #1015 fix also made recover not fetch arc files,
  which obviously isn't a good thing.  This release backs out #1015
  and fixes the recover bug.

#1018 - May 1, 2013 - beta expires Sep 15, 2013


* get --todev accepts symlinks for block device targets
* get accepts symlinks for block devices to restore


- get: on Linux, disk partitions are often symbolic links to the
  actual block device.  When the symbolic link name is used for
  backup, HB also saves the actual block device contents as a separate
  path.  With get's --todev option, the symbolic link could not be
  used since it is not a block device.  Now, get will accept a
  symbolic link with --todev if the symlink is pointing to a block
  device.  A warning message is displayed showing the symlink target.

- get: related to the above, if a symlink name is used on the get
  command line, and at backup time, this symlink pointed to a raw
  device, HB will restore the raw device.  If the symlink does not
  already exist or has a different value than it did at the time of
  the backup, HB will only restore the symlink.  If the get is
  repeated, HB will restore the block device.  A warning message is
  displayed about restoring a block device instead of the symlink.

#1012 - April 24, 2013 - beta expires Sep 15, 2013


* temporarily disable closing of idle ftp connections


- temporarily disabled ftp idle connection handling, because it
  sometimes causes spurious Python errors when the backup program
  stops.  Backups are fine; the errors are caused by ftp idle timers
  running while the main program is dying.

#1011 - April 21, 2013 - beta expires Sep 15, 2013


* new ftp destination keyword "restart"
* ftp restarts downloads too
* ftp keyword Dir is optional
* ftp enables keepalives
* new ftp destination keyword "idle"
* bug fixes


- ftp destinations have a new True/False keyword, "restart".  If not
  present, the default is True and ftp will try to restart failed
  uploads and downloads.  If the restart has trouble, HB will print an
  error message like:

     size mismatch after restart: file is 4976624 bytes, restarted at
     1048576 bytes, uploaded file is 3928048 bytes; disabling restart

  If you see this message, restarts should probably be disabled by
  adding "restart False" to your dest.conf.  Use debug 1 to see the
  ftp conversation to help troubleshoot restart problems.

- ftp can restart downloads as well as uploads with most ftp servers

- ftp: if the Dir keyword is present, a cd occurs to this directory.
  Otherwise, backups are sent to the initial login directory.

- ftp keepalive: routers, firewalls, VMs, and other intermediate
  devices sometimes drop the control (command) connection while a long
  file transfer is in progress.  Then after the file transfer
  completes, a timeout occurs and HB thinks the file was not sent.
  Then a restart (usually) occurs to complete the file transfer.
  Setting keepalive may help prevent this, though whether it works
  depends on your operating system's keepalive settings (how often
  keepalive packets are sent) and how long it takes the intermediate
  device to timeout the connection.

- ftp destinations try to keep connections to a server open for a
  while after each operation to save making another connection.  The
  idle keyword specifies how long in seconds the connection can be
  idle before HB closes it.  The default is 15 seconds.

- ftp: restart didn't work with the bsd ftp server

- if multiple destinations were setup and an old file needed to be
  sent to n destinations, it was sent n times to each destination

#1001 - April 16, 2013 - beta expires Sep 15, 2013


* bug fixes


- restart did not work on some ftp servers where it could have worked;
  now it does

- restart is temporarily disabled until it can be tested on more ftp

#1000 - April 16, 2013 - beta expires Sep 15, 2013


* ls shows /dev symlinks
* ftp restarts uploads
* dest keyword "off"


- on Linux, block devices (logical volumes) are often symbolic links
  that are user-defined names for a "real" block device.  When these
  symlinks are used on the backup command line, for example,
  /dev/mylv, the symlink is saved and also the actual block device,
  eg, /dev/dm-1.  When backups occur for several logical volumes, an
  ls listing becomes confusing because it's hard to tell which symlink
  goes with which actual device.  Using ls -l shows the symlink
  target.  In this release, the symlink target is always displayed if
  the pathname starts with /dev/ (and the user has cd access to the
  directory; this is a permission requirement to display any -l info).

- ftp destinations now restart uploads if the ftp server software
  implements the REST and SIZE commands; most ftp servers implement
  these.  Before restarting an upload, HB verifies that the first 1K
  of the local and remote files are equal, the file size must be 100K
  or greater, and the partial upload must be 100K or greater.

- a new destination keyword, "off", means to disable the destination.
  It can be re-enabled by deleting the "off" line, or changing it to
  "#off", which is a comment.  This is useful for testing and
  travelling, when you may want to temporarily disable a destination.
  It's easier than commenting out all the lines describing a
  destination.  A disabled destination prints a warning message.

#997 - April 13, 2013 - beta expires Sep 15, 2013


* can't repeat keywords in a destination
* new ftps destination (more secure)
* ftp debug keyword
* multiple transactions on ftp connections
* bug fix


- INCOMPATIBILITY NOTE: in previous releases, it was okay to repeat a
  keyword in a destination; the last keyword was used.  Now, it is a
  fatal error to use the same keyword more than once in a destination.
  Repeating a keyword can cause subtle errors, ie, you think you are
  sending files one place, but actually, they are going somewhere else
  because of a repeated Host keyword.  You can leave repeated keywords
  in the dest.conf file, with all but one commented out with a # mark.

- a new destination type, ftps, supports FTP-TLS (Transport Layer
  Security), also called FTP over SSL.  This is not FTP over an ssh
  tunnel, also called Secure FTP.  And it's not sftp, which is a
  separate file transfer protocol built in to ssh.  The names are a
  bit confusing...

  FTP uses a control connection to send authentication and commands to
  the remote FTP server.  A data connection is opened during file
  transfers.  HB's ftps destination uses SSL for the command
  connection so your userid, password, and commands are encrypted
  while talking with the FTP server.  HB does not use SSL for the data
  connection since the backup files being transferred are already

  The ftp destination type (without the s) is still available for
  internal servers, or servers that do not support TLS.  If the ftps
  type is used with an FTP server that doesn't support TLS, this
  message is displayed and the destination halts:

    dest ftp: unable to start: [error_perm] 500 AUTH TLS: command not understood.

  Your backup still runs and other destinations are unaffected when
  one destination halts or doesn't start.

- ftp: the debug keyword with a positive number displays the FTP
  conversation.  Higher numbers display more output, but usually debug
  1 shows enough.

- ftp: rather than making a new connection for each file, the ftp
  connection is left open and reused, reconnecting only when

- when maxsize is used on a destination to limit file sizes, the
  permissions on the files created on the remote were rwx-r-xr-x, but
  should have been rw-r--r--.

#988 - April 7, 2013 - beta expires Sep 15, 2013


* S3 supports multipart uploads
* add fsyncs for XFS zero-length file bug
* imap uses 50% less memory
* bug fix


- S3 destinations have a new 'multipart' keyword, True or False; no
  value means True.  Multipart uploads are enabled by default for
  Amazon S3 and Dreamhost Dreamobjects.  With multipart, instead of 3
  workers sending 3 different files to S3, they all work on the same
  file in parallel.  This helps minimize cache stalls during backup
  when cache-size-limit is used, and tests show more consistent and
  predictable upload rates to S3.

- added fsync calls in a few places to prevent XFS creating
  zero-length files if the system crashes

- imap: reduced memory usage 50%, though imap destinations still
  require twice the file size during send or receive to encode the
  file.  Use a low number of workers for imap as they will use a lot
  of memory.  In tests with, 2 imap workers upload as fast as
  6 workers, and 2.5x faster than just 1 worker can upload.

- if an archive was packed but dest.db could not be written because of
  an unusual permission problem, rm / retain would stop with an error
  (correct).  But when the permission problem was corrected, the
  packed archive was not sent to destinations during the next
  backup (incorrect).  This is fixed.

#971 - March 31, 2013 - beta expires Sep 15, 2013


* bug fixes


- backup: sync could fail with an error "No destinations in dest.conf
  contain arc.x.x", even though a destination did have the file.  To
  trigger this, multiple destinations are setup, backup to them,
  delete one of the destinations, add a new destination, then do a
  backup, causing a sync, and triggering the error.

- selftest: with small backups that are in memory, ie, don't have to
  read from disk, a race condition could cause this traceback:
      File "/", line 144, in <module>
      File "/", line 682, in main
      File "/", line 201, in checkallblocks
    NameError: global name 'shaq_seq' is not defined

#961 - March 29, 2013 - beta expires Sep 15, 2013


* bug fixes


- destination initialization has changed.  Some initialization, like
  checking a hostname with DNS, was being done in every worker thread
  instead of just once, and a fatal error would occur in every worker
  instead of just one

- clear: could cause a traceback when destinations are configured
  because of a race condition while deleting dest.db:

    File "/", line 231, in loop
    File "/", line 408, in rmcmd
    File "/", line 161, in getinfo
    File "/", line 214, in __init__
    File "/", line 230, in opendb
  OSError: [Errno 2] No such file or directory: 'dest.db'

#956 - March 28, 2013 - beta expires Sep 15, 2013


* mount runs in foreground by default
* bug fixes


- recover: recent improvements in the Cloud Files driver caused
  recover to fail with this traceback:
    Traceback (most recent call last):
      File "/", line 626, in <module>
      File "/", line 279, in main
      File "/", line 107, in getfile
    AttributeError: cf instance has no attribute 'container'

- mount: because of ongoing issues with mount putting itself in the
  background, mount now runs in the foreground by default and the
  --debug option has been removed.  To run mount in the background,
  use this to suppress all output:

     $ hb mount -c backupdir mnt >/dev/null 2>&1 &

  or this if you want output in mount.out and errors in mount.err:

     $ hb mount -c backupdir mnt >mount.out 2>mount.err &

  or this if you want output and errors in mount.out:

     $ hb mount -c backupdir mnt >mount.out 2>&1 &

#954 - March 27, 2013 - beta expires Sep 15, 2013

- display file size, transfer time, and transfer rate after files are
  copied to a destination

- Rackspace Cloud Files: new destination keyword 'servicenet', True or
  False, accesses Cloud Files over the local Rackspace network.
  (faster, no download charges)

- Rackspace Cloud Files: random BadStatusLine errors and/or Broken
  pipe errors are less likely

#947 - March 26, 2013 - beta expires Sep 15, 2013

- added an optional timeout destination keyword.  The default value is
  1800, or 30 minutes.

#946 - March 26, 2013 - beta expires Sep 15, 2013

- Cloud Files, OpenStack: HB was requesting a 30 second timeout, but
  the Python Cloud Files library was not passing this correctly and the
  authentication timeout was actually 5 seconds.  If the Rackspace
  authentication servers got busy or your connection was busy doing
  other things, this 5 second timeout could easily be exceeded and
  authentication would fail, leading to unnecessary retries.

- Cloud Files, OpenStack: when certain errors occurs, such as a
  timeout, hb was reusing the socket in the retry loop instead of
  opening a new one.  It could be argued that the Cloud Files library
  should clean up when a socket error occurs after an HTTP request.
  Since it doesn't, the hb retry loop was not working for these types
  of errors.  The traceback would show CannotSendRequest instead of
  the real error, which was a timeout.

#941 - March 24, 2013 - beta expires Sep 15, 2013

- Rackspace Cloud Files (destination type 'cf'): add 'location'
  keyword, with values of either us or uk.  If not specified, the
  default is us.  This is Rackspace specific, only applies when the
  destination type is 'cf', and does not apply to other OpenStack

- OpenStack (destination type 'os'): add REQUIRED 'authurl' keyword
  to specify the authentication endpoint to authenticate with
  non-Rackspace OpensStack object stores.  The version 1.0 API is
  used, so the url looks like (these are RackSpace's endpoints):''

#938 - March 22, 2013 - beta expires Sep 15, 2013

- compare: add -X option to cross mount points, like backup

#937 - March 20, 2013 - beta expires Sep 15, 2013


* new -v retain option
* bug fixes
* note that clear command resets config options


- retain has a new option, -v, to display the files being deleted.  It
  shows the file backup time, filename, version, and retain option
  that caused the file to be deleted.  For example:

2012-10-22 02:30:12 /.DS_Store [r468] -s
2013-03-13 00:39:11 /private/var/log/asl/2013.03.12.U0.G80.asl [r589] -x

- recover: if dest.db couldn't be fetched, recover was giving this
  traceback instead of the reason dest.db couldn't be fetched:
    Traceback (most recent call last):
      File "/", line 124, in <module>
      File "/", line 282, in main
    NameError: global name 'destdb' is not defined

- Dir destinations: when recovering files, dir destinations try to use
  symbolic links because they are much faster than copying files.  But
  some filesystems don't support symlinks and a traceback occurred:

    Traceback (most recent call last):
      File "/", line 124, in <module>
      File "/", line 279, in main
      File "/", line 36, in getfile
    OSError: [Errno 95] Operation not supported

  Now, recover (and get, selftest and mount if cache-size-limit is
  set) will copy the file when symlink fails.

- Amazon Glacier uploads failed with the error:
     dest glac: error #1 of 3 in send arc.18.1: 'Layer2' object has no attribute 'close'

- backup: if cache-size-limit was set, this traceback could occur:
    Traceback (most recent call last):
      File "/", line 75, in <module>
      File "/", line 1973, in main
      File "/", line 684, in sync
      File "/", line 376, in initcache
    UnboundLocalError: local variable 'arcbytes' referenced before assignment

- get: if cache-size-limit was set, a directory was being restored,
  and -r was used to restore an older version, this traceback could
    Traceback (most recent call last):
      File "/", line 103, in <module>
      File "/", line 1087, in main
      File "/", line 868, in plan
      File "/", line 922, in prefetch
    TypeError: not all arguments converted during string formatting

- when an error occurred with a capitalized command line option, eg,
  -D with no size, the error message would be:
      Argument -d: expected one argument
  instead of:
      Argument -D: expected one argument

- mount: when cache-size-limit was set, mount was run in the
  background (without --debug), backup file data was referenced
  through the mount, and a remote archive had to be downloaded, the
  background hb mount process could die and the file access would then
  fail with an input/output error

- when cache-size-limit was set to a small number, like 3, it means 3x
  the arc-size-limit.  But the limit was actually higher because a 4MB
  fudge factor was added.  For large archives, this doesn't matter,
  but for small caches and smallish archives, it is more noticeable so
  the fudge factor has been removed.

- when VMWare shared folders are used as the -c backup directory, the
  timestamps (mtime) of files hb creates in the backup directory can
  change during a transfer to a remote destination.  This isn't
  supposed to happen - it's some weirdness in VMWare's hgfs - and hb
  complains about it:

    dest rsync: stopping because of errors
    dest rsync: Traceback (most recent call last):
      File "/", line 198, in loop
      File "/", line 281, in sendcmd
    Exception: file changed during transfer: 1363391280.0 != 1363391281.23

  Now this comparison isn't done at all.  Instead, inode number and
  size are compared before and after the transfer.  This is to catch
  the odd case of someone sending the backup on top of itself, which
  has actually happened by mistake.

- clear: this command resets all config options to default values.  It
  has always done this, but now a note is printed.  The clear command
  should be replaced by init.  To just remove all backup data, you can
  use rm /, though this is much slower than clear.

#922 - March 7, 2013 - beta expires June 15, 2013


* rm is 2-3x faster
* bug fixes


- rm is 2-3x faster, especially with large files with a lot of dedup
  blocks.  This also speeds up retain when it is removing old
  versions of files.

- on 32-bit systems, stats could fail with a traceback:
    Traceback (most recent call last):
      File "/", line 156, in <module>
      File "/", line 106, in main
    OverflowError: Python int too large to convert to C long

- if the hash.db file didn't exist, stats would fail with a traceback:
    Traceback (most recent call last):
      File "/", line 156, in <module>
      File "/", line 187, in main
    TypeError: 'NoneType' object is not iterable

#919 - March 6, 2013 - beta expires June 15, 2013


- release #918 had a backup bug that could cause warnings about files
  changing size during the backup.  This warning is misleading: what
  actually happened is that the file wasn't backed up correctly
  because of the new feature in #918 where variable-block dedup was
  enabled for single-core backups.  This warning only occurred on
  multi-core systems.

  The backup for these files with warnings is no good, and will also
  cause selftest errors on these files.  Because it is a thread timing
  / coordination problem, you may or may not have errors.  One test
  system had the problem, another did not.  It's recommended to
  completely remove any backups made with #918 with hb rm:

  $ hb rm -c backupdir -r640 --force

  Your -r version number(s) will be different of course.  Use hb
  versions to see which backups were created with #918 and remove all
  of them.  Your next backup with #919 will re-save these files

#918 - March 5, 2013 - beta expires June 15, 2013


* backup -p0 supports variable block dedup
* get & selftest do 4-hour delay with Amazon Glacier
* recover changes for Amazon Glacier
* bug fixes


- backup with -p0 or on a single CPU system runs in a single process
  (thread).  -p0 can be useful on multi-CPU systems to restrict hb's
  system load.  Previously, running in a single process also disabled
  variable-block dedup.  Now backup supports VB dedup with single CPUs
  and multiple CPUs.  If you had been running with -p0 or on a single
  CPU system, your next backup may be larger because files that had
  been saved with a fixed block size will be saved with a variable
  block size.  Because of the block size change, files will not dedup
  well on this next backup, but will after that.  To prevent this
  dedup gap, use -p0 -B1m, to disable variable-block dedup.

  NOTE: variable-block dedup with -p0 (or on a single CPU) usually
  takes twice as long as fixed-block dedup.  Add -B1m to disable
  variable-block dedup with a single CPU if you are more concerned
  about performance than VB dedup.

- Amazon Glacier requires two fetches separated by 4 hours to retrieve
  files.  HB's recover command was doing this, but get and selftest
  were not: if your backup data was only located on Glacier, then
  restoring a few files required an hb get (which failed with an
  error), waiting 4 hours, then retrying the get (which would work
  this time).

  In this release, if you are using Glacier as your first destination
  and running with cache-size-limit set so that your archive files are
  mostly remote, hb get will figure out which archives it needs to do
  your restore, request them all once, delay 4 hours, then start your
  restore, requesting archives again as they are needed.

  IMPORTANT: the HB recover command has lots of options for doing
  paced retrievals, but get and selftest do not have those yet;
  everything is retrieved in 4 hours.  This can be very expensive for
  very large restores, so if you need to restore everything, it may be
  cheaper to set cache-size-limit to -1, use recover to fetch all of
  your archives from Glacier to your backup directory with whatever
  pacing you need to meet your cost objectives, then do your get
  command to restore from the local backup directory.

- (Amazon Glacier) HB has two normal cache configurations:
    1. cache-size-limit is -1: local copy of all archives
    2. cache-size-limit >= 0: some archives may be on remotes
  A special case of #1 is when an archive file is missing, ie, it was
  manually deleted.  HB will download the missing archive when it is
  needed.  For Amazon Glacier, this requires two retrievals, separated
  by 4 hours.  That wasn't happening, but now it is.  Please note,
  this is a very inefficient way to run HB with Glacier, because every
  individual archive needed will require a 4 hour delay.  It's much
  better to set a cache-size-limit; then HB will request all archives
  needed for a restore, wait 4 hours, then request the archives again
  and do the restore.

- the recover --dl option did accept KB/s and Kb/s for bytes per
  second and bits per second (upper or lower case B), but expected the
  K, B, or G prefix to be uppercase and the /s, /m, /h suffix to be
  lower case.  This is too confusing, so now, the --dl option is
  changed to all lowercase, and it's no longer possible to specify a
  rate in bits per second.

- if --dl 1 was used (rate of 1 byte per second), recover would loop
  forever because there wasn't enough bandwidth to retrieve the
  largest arc file.  Now it will display a message and ignore very low
  rates since they are equivalent to Option 4, the "cheap" download
  option.  A separate change triggers a fatal error if this looping
  situation occurs in the future.

- a bug was introduced with the new maxsize destination keyword that
  would generate a traceback similar to this:

    Getting arc.0.7 from drobo-vaio-rsync
    Unable to download archive arc.0.7: AttributeError("rsync instance has no attribute 'getparts'",)
    Traceback (most recent call last):
    File "/", line 75, in <module>
    File "/", line 1941, in main
    File "/", line 779, in sync
    File "/", line 408, in condput
    File "/", line 141, in __init__
    File "/", line 174, in _open
    File "/", line 676, in fetch
    NoArchive: Archive does not exist: /root/vaio-drobo/arc.0.7

  This could happen in several circumstances:

  1. cache-size-limit is >= 0, do backups, set cache-size-limit to -1
  2. cache-size-limit is >= 0, add a new destination, do backup
  3. cache-size-limit is -1, arc files deleted manually

#909 - February 27, 2013 - beta expires June 15, 2013


* new backup -X option to cross mount points
* stats speed up
* --no-compress backup option removed
* bug fixes


- backup: a new option -X means to cross mount points, also called
  descending into other filesystems.  The default is not to cross
  mount points, as before.  Be careful, because -X does not
  discriminate between local filesystems and remote filesystems, so
  you can end up saving an entire NFS server for example.

- the stats command is much faster when there are millions of blocks.
  A backup with 11 million unique blocks was taking 5 minutes for
  stats, but now takes around 2.5 minutes.

- the --no-compress backup option was made obsolete when -Z0 was
  introduced in April 2012.  --no-compress has been removed in this

- the mount command would often give bad address errors when accessing
  files.  This bug was introduced a couple of weeks ago, in #871.

#902 - February 23, 2013 - beta expires June 15, 2013


* new destination keyword 'maxsize'
* new destination keyword 'randfail'
* new stats added, new stat -v option
* destination sync code is 50x faster
* 65% speedup for small archive files
* stats is 2x faster if many versions
* important Glacier bug fix: regions
* bug fixes


- a new keyword for destinations, 'maxsize', can be used to limit the
  size of files uploaded to a destination.  For example, many imap
  servers have small limits like 25MB per file.  While the
  arc-size-limit config keyword can be used to limit the size of
  archive files, there is no way to limit the size of the hb.db.n
  files.  Now, using maxsize, a destination can impose a hard size
  limit.  If a file to be uploaded is larger than maxsize, hb will
  split the file into pieces, each maxsize in length (except the last
  piece), and upload the pieces separately.  Pieces have a .pN suffix
  added.  When the file is retrieved from the destination, each piece
  is retrieved separately and the original file is reassembled.

  Maxsize should be at least 1MB larger than arc-size-limit.  Arc
  files can exceed arc-size-limit by up to 1 block (backup's -B
  parameter).  If you use -B4M for 4MB blocks, Maxsize should be 4MB
  larger than arc-size-limit.  If maxsize is equal to arc-size-limit,
  things will still work, but an archive slightly over arc-size-limit
  will have to be split on upload.  This causes more I/O during
  upload, creates more files on the remote, and the 2nd piece is less
  than a block size, which is inefficient.

  Remote error recovery has been rewritten so that each piece will be
  retried according to the destination's retry settings.  But if one
  piece exceeds the error retries, the entire file will have to be

- a new destination keyword, randfail, can be used to simulate remote
  failures.  The value is an integer 0-100 representing the percentage
  of requests that should fail.  So 25 means 1 out of 4 requests will
  fail, 50 means 1 of 2 will fail, 75 means 3 of 4 will fail, 100
  means every request will fail.  Simulated failures do not generate
  any remote traffic.  Destination threads will stop when all requests
  fail for one file.  Randfail is for testing hb's error recover and
  of course should not be used in normal operation (DUH!)

- some new statistics have been added to the stats command,
  specifically the "industry standard dedup ratio" used by many other
  backup programs.  This ratio is computed as sum(bytesin) /
  sum(bytesout) and assumes that every backup was a full backup.  I
  didn't say it made sense...  Right now, HB has to compute these
  figures and it takes a while, but soon they will be recorded by the
  backup program so the stats command will not have to do so much
  work.  It takes around 5 minutes for 700K files, so could take quite
  a while if there are many millions of files in the backup.

- a new option -v has been added to the stats command.  After each
  line of statistics, a paragraph describing (to the best of my
  ability!) how the statistic is generated, what it means, and why it
  might be useful.

- in testing with 10,000 archive files, the backup sync procedure was
  taking 50 seconds to figure out which archives needed to be copied
  to remotes, even if none did.  Now it takes less than 1 second.
  This was a potential scalability issue for sites using smaller
  archive files (confir variable arc-size-limit).

- archive files default to 1GB (config variable arc-size-limit), but
  some sites want to use smaller archives, either because their
  storage system requires it or because it allows HB to manage deleted
  files better.  With smaller archives, HB can delete entire archives
  more often when rm or retain delete files from the backup, rather
  than going through a download, pack, upload cycle.  But smaller
  archives also have more overhead and can slow the backup down.
  Changes in this release reduce this overhead.  A test backup that
  creates 5000 small arc files is now 65% faster: 40 seconds vs 114

- the stats command is 2x faster if a backup has many versions (mine
  has 600)

- with Amazon Glacier, the dest.conf has a location keyword to specify
  the region for Glacier storage.  This was not working, and all
  Glacier transfers were going to us-east-1, the default Glacier
  region.  Now, the location keyword is honored, and the region is
  also checked to make sure it's a valid Glacier region; currently
  there are fewer Glacier regions than for other AWS services.

  Related to this change, HB creates an associated S3 bucket to be
  used with Glacier.  This bucket contains the database files with
  backup metadata.  The actual backup data - the bulk of data - is
  stored in Glacier.  This S3 bucket was named:
  But, if your Glacier data is getting stored in us-west-1, you
  probably want your associated S3 bucket stored there too.  So now,
  the associated S3 bucket name for regions other than us-east-1 is:

- if a backup contained only empty directories, the new stats command
  would fail with a traceback

- if only dest.db or hb.db.n were being recovered from Glacier,
  recover had an unnecessary sleep after the message:
      Download size: 0 Files: 0

#877 - February 14, 2013 - beta expires June 15, 2013


* exclude directory if tag file is present
* new stats command to display backup statistics
* backup display stats with -v0


- a new config option, no-backup-tag, can be set to a list of
  filenames.  If a directory contains any of these files, the
  directory contents are not backed up.  The directory itself and the
  tag file are the only items backed up.  For example:

    $ hb config -c backupdir -no-backup-tag .nobackup,CACHEDIR.TAG

- a new command, stats, displays statistics about a backup.  More
  stats will be added in the future, specifically about dedup ratios.

- the backup command now displays statistics after the backup, even
  with -v0.  Some sites with huge backups - millions of files - used
  -v0 to prevent displaying any pathnames, but this also suppressed

#871 - February 11, 2013 - beta expires June 15, 2013


* new shell destination type
* hb runs better on hardened kernels
* less space is used on remotes for hb.db.n files
* bug fix: hb dest command was not recognized


- a new destination type, shell, allows customized programs and
  scripts to transfer files to and from remote destinations on behalf
  of hb. in the doc subdirectory is an example shell
  implementation of the built-in Dir destination.  Excluding comments
  and blank lines, this script has only 26 Python statements: shell
  destinations can be quite easy to write.  There is a limitation that
  no state information generated by the remote, such as an object id,
  can be stored.

- on hardened Linux kernels (gsecurity), hb required paxctl -m or
  paxctl-ng -m to allow anonymous mmap with execute privilege.  This
  was only needed for the mount command, to load ctypes and libfuse
  libraries.  But all hb commands would fail with an error in the logs
  like:  denied RWX mmap of <anonymous mapping>, and a traceback like:

    $ hb init -c vinci
    Traceback (most recent call last):
      File "/", line 20, in <module>
      File "/", line 13, in <module>
      File "/", line 19, in <module>
      File "/", line 35, in <module>
      File "/", line 90, in <module>
      File "/", line 19, in <module>
      File "/", line 75, in <module>
      File "/", line 17, in <module>
    ValueError: bad marshal data (unknown type code)

  Now, only the mount command will fail on hardened kernels; all other
  hb commands will work normally, even without marking the hb binary.
  To use the mount command on hardened kernels, use paxctl-ng -m hb to
  allow anonymous mmap w/execute.

- after each backup, rm, or retain, hb.db.n is sent to remote
  destinations.  There is a trade-off between how much data is sent in
  each hb.db.n vs the total size of all hb.db.n files stored on the
  remote.  The smaller each hb.db.n file, the less data transmitted
  after each backup, but the larger the total size of all hb.db.n
  files on the remote.  In this release, hb.db.n files are slightly
  bigger but there will be fewer stored on the remote.  In tests, this
  saves significant space on remote destinations.

- the new dest command was not working in the release build of hb

#862 - January 21, 2013 - beta expires March 15, 2013


* new command "dest" to help secure dest.conf
* add progress meter when restoring large files
* new backup options --maxwait and --maxtime control backup window
* warning instead of error when backing up mounted block devices
* options accept decimal points: -D1.5g for example
* bug fixes


- when backups are sent to remote servers, the credentials for these
  servers - userids, passwords, access keys, etc. - are stored in
  dest.conf, unencrypted.  A new command, dest, can be used to load
  this file into the encrypted backup database; then it can be deleted
  and hb will read the creditials from the database.  Examples:

  * hb dest load  - load dest.conf into the database
  * hb dest show  - show dest.conf from the database
  * hb dest erase - erase dest.conf from the database

  The dest command requires that admin-passphrase be set.  Otherwise,
  anyone could use dest show to display the stored credentials in
  dest.conf.  It is possible but not recommended to remove the
  admin-passphrase after loading dest.conf into the database.

- get: add percent progress meter when restoring files 500MB or more

- backup: new option --maxwait <time> specifies the maximum time to
  wait for archives to upload to all destinations after the backup
  completes.  The time can be specified as a number (seconds), 6h, 1d,
  etc.  This is useful in these situations, maybe others:

  * for initial backups, the backup runs faster than most destinations
    can accept data.  A backup that takes only a few hours to create
    may take days or even weeks to upload over an Internet connection.
    But you may only want your connection used at night, and you still
    want daily incrementals even if the initial upload has not
    finished.  Starting the backup at midnight with --maxwait 6h is a
    way to handle this.

  * a new destination is added to a large existing backup.  It may
    take days to get all of the existing backup data transmitted to
    the new destination, which is okay, but you don't want to lock the
    backup directory for the entire time as this prevents future
    backups until the copy to the new destination has finished

  IMPORTANT: be careful with --maxwait: if it is too short, your
  backup may never get fully synched to remote destinations and the
  remote data would always be incomplete.

- backup: new option --maxtime <time> specifies the maximum time to
  spend actually saving files.  When this time is exceeded, the backup
  stops and waits for uploads to finish using --maxwait, which is
  adjusted based on how long the backup took.  Examples:

    --maxtime 1h means to backup for up to 1 hour then wait the
      remainder of the hour to upload the data, ie, total time is
      limited to 1 hour

    --maxwait 1h means to backup everything requested, but only wait 1
      hour for uploads to finish

    --maxtime 1h --maxwait 1h means to backup for up to 1 hour, then
      wait 1 hour + the remaining backup time for uploads to finish,
      ie, the total time is limited to 2 hours

    --maxtime 1h --maxwait 1y means to backup for 1 hour, then wait a
      year for uploads to finish, ie, only the backup time is limited
  This is useful for the initial and full backups, which usually take
  much longer than incremental backups, and allows you to spread the
  full backups over many days.  It also prevents incrementals from
  running into production time when a large amount of data changes for
  some reason.

  IMPORTANT: be careful with --maxtime: if it is shorter than the time
  required for your average incremental backup and upload, your backup
  may never finish, some files may never get backed up, and/or your
  remotes may never be fully uploaded.  Backup should start where it
  left off when maxtime is set, but it doesn't do that yet.

- backup: instead of giving an error when mounted block devices are
  backed up, backup will only display a warning.  If a partition is
  mounted readonly for example, it's fine to back it up even if
  mounted.  Read/write partitions should not be backed up as they will
  be inconsistent when restored.  You may still get an error on some
  OS's, for example, on OSX:

    Warning: backing up mounted block device: /dev/disk4s1
    This is backup version: 0
    Unable to open file: Resource busy: /dev/disk4s1

- many options that only accepted integers now accept decimal points,
  for example, backup option -D1.5g, config option arc-size-limit 1.5g

- clear: when a Dir destination is setup in dest.conf, files are
  symlinked to the backup directory instead of copied.  The clear
  command deletes the destination files first, then the backup
  directory files.  But the symlinks were pointing to non-existing
  files, so clear would not remove them.  Then, backup would try to
  sync these dangling files, causing errors like:

  Exception: dest: can't access file for put: /Users/jim/hbdev/hb/hb.db.25

- backup: hb has special handling for huge directories to reduce
  memory usage.  (Use ls -ld to see the directory size.)  But if a
  huge directory was actually empty, it would cause an error:

    Traceback (most recent call last):
      File "", line 2092, in <module>
      File "", line 1953, in main
      File "", line 1569, in backupobj

- backup: in some environments, backup would halt with "database is
  locked" errors.  This seemed more likely when multiple workers were
  used on destinations, and since the default is now 3 workers, this
  has become more of a problem.  This error is sometimes hard to
  reproduce (that means I couldn't reproduce it).  Changes have been
  made in this release that should fix this.

- backup: if the initial backup ran long enough to create 1 archive
  file then was interrupted, mounting the backup would not work.
  Another backup of even of a single file that ran to completion would
  fix the problem.

- backup: backup files without dest.conf, create dest.conf, repeat the
  same backup immediately; no files are modified, but hb.db.0 still
  needs to be transmitted and wasn't.  This is an unlikely bug because
  any non-null backup would fix the problem, but it confused me so
  it's likely to confuse others.

#847 - January 15, 2013 - beta expires March 15, 2013


* add paced retrievals in recover, for Amazon Glacier
* Glacier bug fix when local archives are missing
* add "args" keyword to ssh destination
* accept quoted arguments in rsync "args" keyword
* minor selftest bug fixes
* backup sometimes used too much memory


- Amazon Glacier: recover now has 4 options for recovering your local
  backup from Amazon Glacier:
  * Option 1: recover all files within 4 hours 

    This option is --dl now.  It's usually the most expensive option,
    though if there is only 1 archive to retrieve, this is the only
    way.  Today hb will not segment one archive to retrieve it over a
    longer period with ranged retrievals.  Because of Glacier's
    unusual pricing, it's best not to use large archives (over 1GB).

  * Option 2: retrieve over N hours or days
    This option is --dl 8h for hours, or --dl 8d for days.  Using these
    allows pacing the retrieval over the time period specified, with
    files evenly divided into 4-hour download groups.

  * Option 3: retrieve using specified bandwidth
    This option specifies the average bandwidth to use for the
    retrieval, for example, --dl 1MB would be 1 MByte/sec, --dl 1Mb
    would be 1 Mbit/sec.  You can also use G and K suffixes.  The
    actual downloads are not throttled to this download rate; the
    bandwidth number is only used to decide how much data to retrieve
    in each 4-hour download block to give this average rate.

  * Option 4: retrieve based on archive sizes
    This option, --dl cheap, minimizes your peak retrieval rate and is
    usually the least expensive option.  The only time it will not be
    least expensive is if the retrieval crosses from one month into
    the next.  In that case, retrieval costs can double and it may be
    better to keep the retrieval all in the current month using option
    2 or 3.  This is the default option.
  Using one of these options, recover will pace the retrieval of
  archives stored in Amazon Glacier in 4-hour download groups.
  Retrieval pricing in Glacier is complicated and can be expensive, so
  you should review it before doing a large Glacier retrieval.

- Amazon Glacier: when archives are cached locally, get, selftest, and
  mount expect archives to be in the backup directory.  If they aren't
  there for some reason (manually deleted) they will be fetched from a
  remote destination as needed.  None of this is new.

  When Amazon Glacier is the remote in this situation, new retrieval
  jobs would be started and get/selftest/mount would fail with error
  messages, because it takes 4 hours to get files from Glacier.  The
  bug here is that if you waited 4 hours and retried the same
  operation, it should have fetched the files from Glacier.  Instead,
  it was starting new retrieval jobs.  This has been fixed, though
  having to run programs twice is not ideal.

- added an args keyword to the ssh destination.  This allows
  specifying sftp options (use man ssh_config to see the options).
  For example:

    args -oIdentityFile=~/.ssh/otherid -oLogLevel=debug

  will use an alternate private key file and enable extra debug
  messages.  You can use quotes if necessary.

- rsync: added Port example in dest.conf.rsync example.  rsync Args
aaaaaaaaaa  keyword now parses quotes like a shell would, allowing quoted
  strings.  For example, to use an alternate ssh port and userid:
    Args -e "ssh -p 8002 -l sshuser"

- selftest: if a raw block device backup was interrupted after it had
  written at least 1 archive file, selftest would display an error:
    Error: for logid 3, hash should not be null: /dev/disk4 [r0]

- selftest: if a raw block device was backed up and cache-size-limit
  was set (not all archives are local), selftest would display a
  warning message (the numbers will be different):
    Oops: planned read count 7 != actual read count 0; diff = 7

- selftest: was not running its normal file data integrity checks on
  raw block device backups

- backup: sometimes when doing a multi-thread backup of a very large
  file with a small block size (as with VM images), backup could use
  excessive memory.  This was not related to dedup, but was more an
  issue with how the OS scheduled threads and is similar to the
  selftest problem fixed in #800.  It seemed to occur more often on
  Linux, but could have happened anywhere.  Backup's memory usage is
  now much more controlled and stable.

#837 - December 26, 2012 - beta expires March 15, 2013

- Amazon Glacier support is available with a new destination type,
  glac.  Glacier is an inexpensive (1 cent/GB/mo) archive system that
  works well for backups, providing you don't need to restore very
  often.  Since HashBackup supports multiple destinations, Glacier can
  be used as cheap offsite insurance for an onsite backup.

  Example dest.conf entry for Glacier:

    destname myglac
    type glac
    accesskey <amazon access key>
    secretkey <amazon secret key>
    vault hbvault
    dir server1

  The accesskey and secretkey are the same as used by other Amazon Web
  Services such as S3.  Unlike S3 buckets, the vault name does not
  have to be globally unique: it only has to be unique for your
  account.  Using the Dir keyword, it is possible to store multiple
  backups in a single vault.

  Glacier has some unusual trade-offs for cheap storage:
  * retrieval time is around 4 hours
  * there are retrieval costs in addition to bandwidth charges
  * you get a free allowance for small retrievals
  * you have to spread large retrievals over 600 days to be 100% free
  * fast retrievals can be very expensive
  * the option to ship data on a disk is available, but you still pay
    retrieval fees plus the fee for shipping the disk

  Glacier retrievals occur in 2 stages:
  * first, request a file; then wait 4 hours
  * second, download the file from Glacier
  * retrieved files are available for about a day

  To handle this situation, HashBackup stores its databases in a
  helper S3 bucket named <accesskey>-hashbackup-glac-vaults that is
  created the first time you use Glacier.

  NOTE: your AWS access key is not a secret: it is sent on every AWS
  request and is basically your AWS userid.

  Backups to Glacier look just like backups to any other destination.
  But hb recover must be done in 2 steps:

  1a. first, databases are downloaded immediately from S3
  1b. retrieval jobs are created for archive files, messages are:
      Started retrieval job <long job id> status InProgress for arc.0.0

  After this, you must wait around 4 hours for Glacier to make your
  archive files available.  Then run recover again, and assuming all
  files are available, your archives will be downloaded to the local
  backup directory.

  If hb recover is run before all files are retrieved, there will be
  messages like:
    Queueing arc.0.0 from glac
    Retrieval job <long job id> status InProgress for arc.0.0
  If this happens, you must wait longer and run recover again.
  Eventually, you will get all your archive files back.

  Glacier works best as a "backup of last resort", where you have
  another copy of the backup locally.  It can be used as the only copy
  of the backup (if cache-size-limit is set), but large retrievals are
  expensive and may be difficult to manage.

  If you make a backup to Glacier then decide you don't want it, it's
  important to hb clear the backup before deleting the backup
  directory.  The reason is that, unlike S3, Glacier will hang on to
  your files, even if you make a new backup with the same vault and
  dir keywords.

- recover: handles S3 files that have migrated to Glacier by
  requesting a restore and holding restored files in S3 for 5 days.
  Because each Glacier restore takes 4 hours, recover has to be run
  several times:

  1. first, dest.db is requested (4 hours)
  2. then hb.db.n files are requested to recreate hb.db (4 hours)
  3. then arc.v.n files are requested (4 hours)
     NOTE: this step is omitted when cache-size-limit is set
  4. then the final recover can fetch archives from S3

  Each recover run except the last will display messages for files
  that are still transitioning from Glacier back to S3, for example:

    s3(mydest): file is being restored from Glacier to S3: hb.db.0

  When hb tries to access these files that haven't yet been
  downloaded, an error will occur:

      Loading /Users/jim/hbdev/hb/hb.db.0
    Traceback (most recent call last):
      File "", line 269, in <module>
      File "", line 218, in main
      File "/Users/jim/hbdev/", line 486, in get
      File "/Users/jim/hbdev/", line 895, in applyincdb
    IOError: [Errno 2] No such file or directory: '/Users/jim/hbdev/hb/hb.db.0'

  This isn't very elegant and might be improved in the future, but it
  has worked so far in testing.

- recover: if configured in dest.conf (Workers keyword, default is 3),
  use multiple workers to download files simultaneously, reducing
  recovery time

- validate s3-compatible bucket names before creating new buckets:
  must be 3-63 characters, start and end with a-z or 0-9, and may
  contain dashes.  Existing buckets with names violating these more
  strict rules are still accessible.

- with multiple workers there was a race condition in s3-compatible
  destinations when a bucket was created: several workers could try to
  create the bucket at once, causing errors.  This is fixed.

#828 - December 22, 2012 - beta expires March 15, 2013

- S3-compatible destinations now store an MD5 checksum in the database
  when a file is transmitted, and compare this to the server-generated
  MD5 checksum when a file is retrieved.

- S3-compatible destinations have a new Dir keyword, so that many
  backups can be stored in one bucket, and backup data can be
  segregated from other data in a bucket.

- Rackspace Cloud Files destinations have a new Dir keyword, so that
  many backups can be stored in one container, and backup data can be
  segregated from other data in a container.

- there was a bug in the previous db upgrade process with old archive
  files.  The error during upgrade was:
    Unable to upgrade your database to rev 13: write() takes exactly 5
    arguments (4 given)

- for longtime beta testers using dest.conf, an old hb.db file could
  be stored on a remote.  In the Dec 3rd release, the sync code was
  rewritten.  This new code would see the old hb.db on the remote,
  delete it, which is correct, but also delete the local copy, which
  is not correct.  This bug has been fixed.  The workaround for the
  lost hb.db is to rename hb.db.orig to hb.db and run the upgrade
  again, or use recover to rebuild hb.db from the remote copy

- when generating keys, get as many bytes as possible from
  /dev/random, without blocking; get the rest from /dev/urandom

- OSX Snow Leopard and Lion have a bug where an unaligned disk write
  that would fill a disk does not return an error.  Instead, it does a
  partial write and doesn't throw an error.  In testing, this OSX bug
  could corrupt the dedup table.  Code was added to detect this

- if a disk full condition occurred at a particular point during
  backup, it could cause selftest errors later because of the
  partially saved file.  Now if a critical error occurs during backup,
  backup will display a "Fatal error:" message and stop immediately.

- related to the disk full problem, a new selftest option, --fix, has
  been added.  With this option, selftest will make corrections for
  simple errors and will not stop at 100 errors as it usually does.
  Corrections occur with -v2 (or higher).

#814 - December 14, 2012 - beta expires March 15, 2013

- added support for DreamObjects (destination type is do), an
  S3-compatible service by Dreamhost with good prices: 7 cents per GB
  for both storage and outgoing bandwidth.  For more information, see:

- selftest -v4 could fail with this error:
      Traceback (most recent call last):
    File "/", line 140, in <module>
    File "/", line 674, in main
    File "/", line 199, in checkallblocks
      UnboundLocalError: local variable 'row' referenced before assignment

#811 - December 13, 2012 - beta expires March 15, 2013

- [#797] selftest: added --debug option to print logid, pathname, and version
  for files as selftest runs

- [#798] init was failing if backup directory didn't already exist

- [#799] error messages are clearer when recover is used with the wrong key;
  removed temporary files created so that if recover is used again, it
  displays the same error messages

- [#800] on 32-bit Ubuntu Linux 12.04.1 LTS, with a 128GB file of random
  data, selftest would continually consume memory until it failed with
  an error like this:
     Error: Error reading archive: , logid 5, blockid 2050709:
        /mnt/snapshot/data/com bo.bin [r0]
     Exception in thread decrompq_loop:
      File "/", line 60, in start_thread
      File "/", line 97, in decrompq_loop
  or it was killed by the OS OOM (Out Of Memory) handler.

- [#801] mount: on Linux, du was always reporting zero for directory
  sizes.  Also changed top-level directory names from just version
  number to YYYY-MM-DD-HHMM-rV, to be more accessible.  The latest
  backup is now called 'latest' instead of 'c'.  To cd using just
  version numbers, use cd *r5 for version 5.

- [#802] mount: the Linux stat command would sometimes print different
  values for st_blocks on a real file and an hb mount containing the
  same file, with the difference being less than 8.

- [#803] compare was displaying socket files as if they were new,
  because backup never saves socket files.  Now compare ignores socket
  files, like backup.

- [#808] after a database upgrade by the 7xx and 8xx series of
  releases, version 6xx of hb was still able to access the backup
  database.  Now, after #808 or later is used, the correct error
  message is displayed when an old version of hb is used on an
  upgraded database: you need a newer version of HashBackup to access
  this backup.

- [#809] /dev/urandom is now used instead of /dev/random for key
  generation.  These are only different on Linux.  Recent versions of
  Linux running on a single-user VM frequently do not have enough
  entropy in the random pool to generate even a 128-byte key, so hb
  init can block for a very long time.  With this switch to
  /dev/urandom, hb init will not stall.

- [#810] update dest.conf README file to explain why changing destname
  after files are backed up is a bad idea and causes breakage

- [#811] backup: on Linux, filesystems that don't support flags, eg,
  FUSE, would sometimes return garbage flags.  If the nodump bit was
  set in the bogus flags, the file would not be backed up.  This has
  been fixed.

#796 - December 3, 2012 - beta expires March 15, 2013

IMPORTANT: all previous releases had a major imap bug if rm or retain
was used.  Please see the bug section if you have been using imap.


* database upgrade to dbrev 13 (may take a while)
* config command revised
* new 'cache-size-limit' config option for remote-only archives
* new 'workers' keyword for multiple uploads and downloads
* new 'retries' keyword to control destination retries
* prefetch remote archives during restores & selftest
* direct reads (no copying) from Dir destinations
* database upgrades revised for better compatibility
* non-incremental backups with --full backup option
* new 'audit-commands' config option saves command history
* ask for passphrase twice and verify they match
* Amazon S3 performance & reliability enhancements
* new 'rate' keyword to limit outgoing / upload bandwidth
* new 'pack-percent-free' config option
* new 'pack-remote-archives' config option
* removed 8GB limit on config option 'arc-size-limit'
* any config option change displays both old and new values
* improved selftest -v2/3/4 performance
* re-add 'userid' keyword to ssh destination
* imap destinations handle errors better
* minor change to inex.conf exclude handling
* bug fixes


- NOTE: this rev will do an automatic database upgrade to dbrev 13
  when any HB command is used, to support remote-only archives and a
  limited archive cache.  All archive files must be read for this
  upgrade, so the upgrade could take some time to complete

  IMPORTANT: do not enable any new features in this release until
  after your database has been upgraded.  For example, don't add the
  new 'rate' keyword to any destinations until after your next backup
  using this new release.

- HB versions all config changes so that config -rN displays the
  configuration settings of backup N.  With the old config setup, if
  an option is changed, a backup is taken, and that backup is later
  removed with rm -rN, the config option change also disappeared.
  Sometimes this is okay / desired, but usually it is unexpected,
  especially with options like admin-passphrase: users expect that if
  they put an admin-passphrase on a backup, it stays there.  A
  separate problem is that if backup N caused a database upgrade, then
  all backups N and later were removed with rm, the next operation
  would want to do a database upgrade; this failed because the upgrade
  had already occurred.

  To fix these issues, HB now has one config that is "current".  After
  each backup, the current config is saved.  But if backup versions
  are removed, it doesn't affect the current config settings.  Also,
  --revert was removed from config as it was seldom used in practice.

- HB supports remote-only archives using the new cache-size-limit
  config option.  This option defaults to -1, meaning there is no
  local cache limit and archive files will stay in the local backup
  directory as before.

  Keeping a local copy of all archive files and leaving
  cache-size-limit set to -1 has several benefits:

  - there is a redundant copy in case something happens to the remote

  - restoring from a local backup directory is much faster than
    restoring from a remote, where archives have to be downloaded

  - disk space is cheap, and most systems have room for a local copy
    of the backup.  Setting a cache size limit will save space during
    backups, but you may still need lots of local disk space during
    restore to download required archives

  - with local archives, backup never stalls waiting for remotes to
    accept archives

  - the backup directory does not have to be locked for read-only
    programs, such as mount, allowing backups to run while mount is
    running.  When the cache is limited, only one program can access
    the backup directory because archives might be coming and going

  - errors on a remote, such as it being down or having a full disk,
    do not affect a backup with local archives: the archives will be
    sent the next time the remote is available.  With a limited cache,
    the backup will halt if the cache becomes full and any remote
    stops accepting data

  But, there are environments where keeping a complete local copy of
  archives may not make sense:

  - hb is not the only backup
  - hb is being used as an archive tool
  - local disk space is extremely scarce, as on a small VPS
  - the "remote" backup target is really on a local network
  - your offsite storage provider will ship disks
  - raw partition backups

  Cache-size-limit is useful for these situations.

  If cache-size-limit is set >= 0 with hb config, the backup program
  may remove local archives after they have been transmitted to all
  remotes, to stay under the cache size limit.  Cache-size-limit zero
  means "no local archives".  This causes backup to stall after each
  archive until it is transmitted to all remotes.  A better option is
  to set cache-size-limit to 1-1000.  These small numbers mean
  "multiply by the max archive size".  So cache-size-limit 5 with the
  default arc-size-limit of 1gb means that 5GB of archive data can be
  kept locally after it is sent to remotes.  A specific size can be
  set too, for example, cache-size-limit 10gb.  There is no minimum
  cache size: HB will function correctly no matter what size is used,
  though it may need to temporarily go beyond the limit while
  executing a command.

  IMPORTANT: cache-size-limit is ignored and a warning is displayed if
  there are no remote destinations configured.

  If a new destination is added to dest.conf and some archives are
  only stored remotely, hb has to download these archives first and
  then upload them to the new destination(s).  In this release, this
  remote to remote synchronizing blocks other operations and backup
  waits until all destinations are synchronized before starting the
  backup.  With a local backup (no cache-size-limit), this doesn't
  happen because downloads (remote to local) aren't necessary.

  Another effect of a limited local cache is that backups may be
  delayed if there are slow remotes.  For example, if the
  cache-size-limit is 2GB and you have 16GB to backup, the backup
  program may have to delay while archives are uploaded.  This doesn't
  happen with a local backup (no cache-size-limit).

  One guideline for setting cache-size-limit is to use at least the
  average size of your typical backup.  This will allow backup to
  finish without waiting for remotes to accept data.

- a new dest.conf keyword, 'workers', can be added to any destination.
  This is the number of concurrent connections to a destination.  The
  default is 3, allowing uploading or downloading 3 files at once.
  Set workers to 1 if you want to minimize the impact of hb on your
  network connection.  (You can also use the new 'rate' keyword to
  limit the network impact.)

- a new dest.conf keyword, 'retries', can be added to any destination.
  If omitted, destinations will retry 2 times on errors (3 times
  altogether), delay 5 seconds the first time, then multiply the delay
  by 2 for each retry.  This is equivalent to retry 2, 5, 2.  Up to 3
  integers can be used with the retry command, with defaults used for
  missing values.  So retry 1 means to do only 1 retry with a 5 second

- get: creates a cache plan when cache-size-limit is set, and
  downloads archives in the background in the order they are needed.
  While get is running, the archive cache might exceed its limit.
  This may be necessary to avoid downloading the same archive more
  than once.  After get finishes, it will trim the cache back to

- selftest: creates a cache plan when cache-size-limit is set, like
  get.  Selftest's -v option controls the level of testing.  If
  cache-size-limit is set, selftest defaults to -v2 to prevent all
  archives from being downloaded.  If -v3 or higher is requested with
  cache-size-limit set, selftest will show how many archives need to
  be downloaded, how much space will be needed, and then ask for
  confirmation.  If cache-size-limit is not set (the default, meaning
  all archives are kept locally too), selftest uses -v9 as before.

- when fetching files from a Dir destination, a symlink is created
  instead of copying the archive to the cache.  This allows reading
  directly from the Dir destination file.  This is useful with mounted
  remote storage such as Google Drive, WebDAV, etc., because they
  support reads without needing to download an entire archive.  For

  - your backup directory is /hb, on the local filesystem
  - setup dest.conf with a Dir destination for remote mounted space
  - use hb config cache-size-limit 5 for a small local cache
  - your hb.db file is in fast, local storage in /hb
  - your archives are on slower remote mounted space
  - restoring a file will access remote storage directly

- The database upgrade process was rewritten for this release.  The
  old upgrade system worked well for a single-rev upgrade, but
  sometimes failed on older databases.  This release can update
  backups created with #339 (Oct 2010) or later, and is more reliable
  for future upgrades.

- backup: a new option, --full, forces a full backup.  This adds
  redundancy to the backup and can make restores faster too by
  reducing fragmentation in the backup.  When cache-size-limit is set,
  reducing fragmentation will usually reduce restore times by reducing
  the number of remote archives that need to be downloaded.  -D can
  still be used with --full to enable dedup, but no data from previous
  backups is reused.

  Another way to achieve backup redundancy vs -full is to simply start
  a new backup directory.  The advantage here is that everything is
  redundant, including the backup database.  The disadvantage is that
  it is harder to manage and configure, because each backup directory
  has to have unique remote destination directories, and retain cannot
  be used across multiple backups.

- a new config option, audit-commands, enables audit logging for the
  commands listed, or if 'all' is used, auditing is enabled for all
  commands.  To display the audit log, use the new hb audit command.
  Audit logs cannot be removed from the database.  For more secure
  audit logging, admin-passphrase should be set and disable-commands
  config; this prevents someone from turning off audit logging without
  the admin passphrase.  Example audit log:

    [jim@mb hbdev]$ hb audit -c hb
    Backup directory: /Users/jim/hbdev/hb
    Showing recent history

    Started: Mon 2012-11-19 15:40:26
    Build: 764
    Uid: 501 (jim)
    Gid: 20 (staff)
    Working dir: /Users/jim/hbdev
    Command: backup -c hb doc
    Finished: Mon 2012-11-19 15:40:27
    Exitcode: 0

    Started: Mon 2012-11-19 15:40:32
    Build: 764
    Uid: 501 (jim)
    Gid: 20 (staff)
    Working dir: /Users/jim/hbdev
    Command: ls -c hb
    Finished: Mon 2012-11-19 15:40:32
    Exitcode: 0

- rekey now asks for a new passphrase twice, and verifies that they
  are the same.  Before, a typo in the passphrase could make the
  backup database inaccessible.  To abort a rekey, enter mismatched

- Amazon S3: performance may be increased for large S3 transfers
  because of better use of Amazon's load balancing, and S3 error
  recovery may be better when an Amazon S3 server is having issues.

- a new destination keyword, 'rate', allows limiting upload bandwidth
  for for Amazon S3, ftp, imap, dir, Rackspace Cloud Files, and rsync
  destinations.  The value is the outgoing transfer limit in bytes per
  second, for example, rate 100k would mean 102400 bytes/sec.  This is
  the upload limit for each worker on a destination, and since the
  default is to have 3 workers, the aggregate limit would be 300k for
  this example if all the workers were busy.  If rate limiting is
  used, you may want to add the worker keyword with a value of 1, to
  limit bandwidth further.  A rate limit less than 1024 raises an
  error - it's probably a typo.

- after removing files from the backup, either with the rm or retain
  commands, some archives may have empty space.  If there is enough
  empty space (25%), HB would compress archives to save local disk
  space.  If any archives shrank 50%, hb would retransmit them to
  remote destinations to free up remote disk space.  This works well
  when archives are local.  But when archives are only stored remotely
  because cache-size-limit is set, this packing operation requires a
  download first.

  So a new config option has been added, pack-percent-free.  This
  takes a number, which is a percent.  If an archive has this much
  free space or more, it will be packed to save disk space.  The
  default is 50, so archives are packed when 50% or more space is
  free.  Another useful value is 100: archives are never packed, but
  are deleted when they are completely empty.  It may or may not be
  cost effective to set this, depending on whether your cache is
  limited and the rates you pay for outgoing bandwidth, incoming
  bandwidth, and storage costs.

  NOTE: old format archives created before Sep 2012 cannot be

- a new yes/no config option, 'pack-remote-archives', specifies
  whether to pack archives that are not stored locally.  The default
  is no.  This option exists because many cloud storage vendors charge
  for download bandwidth, so it may cost more to download an archive
  for packing than it is worth: it might be cheaper to just pay for
  the storage.  This option only makes sense when cache-size-limit is
  also set.

- the CRC on archive blocks was removed.  This CRC was intended to
  allow remotes to do some simple archive validation, but that's not
  possible with the other changes in this release.  Each block still
  has a full, encrypted SHA1 hash for data verification during
  restores, but remotes can't access it: they don't have the key.  And
  each file still has a full SHA256 as a double check to verify
  restored files.

- mount: because there is no way to predict which archives might be
  accessed, the mount command cannot prefetch remote archives.  This
  may lead to slow file data access while remote archives are
  downloaded.  The cache size limit is ignored while mount is running,
  to avoid downloading the same archive more than once.  When the HB
  backup filesystem is unmounted, the cache is trimmed back to

- in previous releases, read-only programs like mount did not lock the
  backup directory.  But when the cache size is limited, all programs
  must lock the backup directory since archives may be moving in and
  out of the cache.

- the config option arc-size-limit sets the maximum size of archive
  files created by backups.  Previously this was limited to 8gb; now
  there is no upper limit.  The default is still 1gb.  To facilitate
  testing, the lower limit has been changed from 1mb to 10000 bytes,
  but small sizes like this should not be used for real backups.

- config: when a config option is changed, both the new and old value
  are displayed

- selftest -v2 -p0 (one core) at a customer site was 7x faster than -v2
  -p4 (4 cores).  This performance issue, which also affected -v3 and
  -v4, has been fixed.

- multi-core selftest could report an error with a file, then
  "Verified x files with 0 errors", then at the end, "1 error"

- the Userid keyword was added back to the ssh destination.  This was
  accidentally removed when hb switched to using sftp.  Also, this
  destination always tried to create the target directory on the ssh
  host.  Now, it will only do this when files are sent to the host -
  not on a remove or fetch.

- before, the inex.conf rule ex /tmp/ meant the same thing as ex
  /tmp/*.  The problem is it also prevents requesting a backup of
  /tmp/jim; it saves the directory but not the contents, and it's very
  confusing: "Why won't it work!?"  Hey, if it confuses me, which it
  did, it's confusing!  Now, ex /tmp/ works as before when backing up
  /, and will not save the contents of /tmp.  But you can request a
  backup of /tmp/jim and it will work as expected.  To get the old
  behavior, use a rule ex /tmp/*.  Then when requesting a backup of
  /tmp/jim, the directory itself is saved since you explicitly
  requested it, but the contents are excluded by inex.conf.  Also, a
  new warning is displayed when a requested directory's contents are
  excluded by inex.conf.

- imap destinations would sometimes reuse a connection when an error
  occurred.  But sometimes these errors are fatal, for example, if the
  imap server resets a connection.  Now when an error occurs, the old
  connection is closed and a new connection is created.  hb retries 3
  times when a destination encounters problems, and imap retries up to
  20 minutes with an exponential backup.  So altogether, imap will
  retry for around an hour before giving up.

- imap performance is slightly improved by sending one less imap
  command per file sent, received, or removed

- IMPORTANT IMAP BUG FIX: in all previous versions of hb, rm and
  retain could delete too many archives on the imap server, rendering
  the remote backup incomplete.  The symptom of this problem is that
  when an archive arc.v.n is removed, all archives with arc.v.n as a
  prefix are removed from the imap server.  For example, if retain
  removes arc.0.3, arc.0.30 and arc.0.31 are also removed incorrectly.
  A similar problem occurred with hb.db.n files.  This only occurs on
  imap destinations.

  To correct this, the database upgrade procedure will tag files that
  need to be resent to imap destinations.  After the next backup
  completes, the remote imap destinations should have the missing

  To verify your remote imap backup is correct, wait until your next
  backup completes with this new version.  Then create a new temporary
  backup directory, copy your key.conf and dest.conf files there, and
  run hb recover -c tempdir.  This will download all of your remote
  backup files from your imap server.  Run hb selftest -c tempdir to
  verify there are no errors.  Then tempdir can be deleted.

  If you still have selftest errors because of missing archives, edit
  your real dest.conf file in your production -c backup directory and
  change the destname of your imap destination.  For example, if it
  now says destname imapjim, change it to imapjim2.  On your next
  backup, all backup files will be uploaded to the imap server.  Then
  repeat the selftest in the previous paragraph to verify your remote
  imap backup has been fixed.

- imap: would sometimes fail to parse server responses correctly if
  additional information was included, causing unnecessary retries.

- imap: added a debug keyword.  debug > 0 will cause imap FETCH
  responses to be dumped, which can help diagnose parse errors.  debug
  >= 4 will dump the entire imap conversation.

- Dedup statistics were inconsistent.  For example, backup might say:
      Dedup enabled, 28% of current, 7% of max
  But then retain or rm would say:
      Dedup enabled, 28% of current, 0% of max
  Rm and retain never expand the dedup table, so they don't have the
  -D option to specify a maximum dedup table size.  Without knowing a
  maximum size, it was assumed to be huge; so the 2nd stat was always
  0%.  To avoid confusion, only backup prints dedup statistics now.

- if hb.db was deleted, running backup or recrypt would create a new
  (empty) hb.db.  When this empty hb.db was sent to remotes, it would
  cause all hb.db.n files to be deleted.  Now, backup and recrypt will
  issue an error message that hb.db doesn't exist.  The usual remedy
  would be to run recover to regenerate hb.db.

- recover checks that hb.db recovered from a remote is bit identical
  to the original hb.db.  In a specific, unusual situation, recover
  would report that all signatures matched on each hb.db.n file, but
  then would report 'Database HMAC mismatch' on the final database and
  stop.  The recovered database was equivalent to the original, but it
  was not identical.  This bug has been fixed and the error message is
  now a warning; the database is available if you choose to use it.

- get: setuid, setgid, and the sticky bit were being saved but not
  restored on regular files.  This has been fixed.  These bits are
  also now restored if the numeric userid running get is the same as
  the numeric userid of the restored file.

- ls: setuid, setgid, and the sticky bit are displayed with ls -l

- when the backup directory is locked and can't be accessed, the error
  message displays the process id owning the lock

- get: if a directory was restored with --orig and the parent
  directory didn't exist, get would fail with an error message
  "ValueError: too many values to unpack"

- get: if two hard links to the same file were listed on the command
  line, the first file existed before the restore, and the restore
  replaced it, the restore of the 2nd hard link would fail with an
  error like:
     Unable to hardlink: No such file or directory:
  followed by two pathnames .../hb-540858.tmp -> .../hb-297485.tmp

- clear: if hb.db didn't exist, clear stopped with an error; it should
  have cleared the rest of the directory

- clear: would sometimes print the error message "database schema has

- versions: could fail with the error message KeyError: 'getpwuid():
  uid not found: nnn' if a backup from one system was transferred to
  another system with different userid -> name mappings

- add a test to all programs that the -c backup directory is actually
  a directory

- when reading passphrases, a warning was displayed and characters
  still echoed on the screen

- backup: when the dedup table was full, it was resized (to the same
  size) at the beginning of the backup

- large dedup tables could cause MemoryExceptions and other problems
  when a program started.  A 1GB dedup table was actually temporarily
  requiring 4GB of memory.  As a side effect of fixing this, loading a
  large dedup table is now much faster.

- mount, backup, selftest: if certain files are backed up with dedup
  enabled, fixed block sizes in one backup version, and variable block
  sizes in a later backup version, reading them via mount may cause a
  Bad address error.  This is extremely data dependent: on my backup
  of 800K files, this occurred with 4 files.  The cause of the problem
  is a bug in the backup program, and that has been fixed.  selftest
  has been changed to detect this problem (requires -v9).  The
  database upgrade will correct files with this problem.

#668 - September 16, 2012 - beta expires December 15, 2012


* validate destination keywords
* new rsync args keyword wasn't working


- bogus keywords in a destination (typos) were silently ignored.
  Unrecognized keywords now generate a fatal error.

- the new rsync Args keyword was not getting inserted into the rsync
  command line.

#664 - September 12, 2012 - beta expires December 15, 2012


* suppress echo when entering passwords
* add args keyword to rsync destinations to insert arguments
* enable/disable commands with config options & admin passphrase
* backup bug fix


- previously, passphrases were read from stdin and displayed during
  keyboard entry.  Now echoing is suppressed and passwords are read
  from /dev/tty.

- rsync destinations have an Args keyword, and any options listed here
  will be inserted into the rsync command line.  For example:
    Args --bwlimit=64

- three new config keywords were added:

  1. admin-passphrase: defaults to ''.  If set to something else, this
     passphrase has to be entered to view or change the config.

  2. disable-commands: (default '').  A comma-separated list of hb
     commands that require the admin passphrase.  For example:

     $ hb config -c hb disable-commands clear,recover,rekey,recrypt,retain,rm

  3. enable-commands: (default '').  A comma-separated list of hb
     commands that do NOT require the admin passphrase to be executed.
     All other commands will require the admin passphrase.  For

     $ hb config -c hb enable-commands backup,help,ls

  It is an error to set both enable-commands and disable-commands.  It
  is more secure to set enable-commands because if new commands are
  added to hb they will automatically be disabled.  It's not possible
  to disable upgrade because it is not associated with an hb database,
  and that's where the config information is stored.

- backup: if an hb.db.n file from a previous backup needed to be sent
  to a destination, an error could occur:
    Traceback (most recent call last):
      File "/", line 41, in <module>
      File "/", line 1835, in main
      File "/", line 607, in sync
    NameError: global name 'HBDB' is not defined

#656 - September 10, 2012 - beta expires December 15, 2012

- on Linux, if the random number entropy pool is low, reading from
  /dev/random is supposed to block.  This may return a short read,
  causing problems such as:

    Exception in thread mashq_loop: AES key must be either 16, 24, or 32
    bytes long
      File "/", line 60, in start_thread
      File "/", line 742, in mashq_loop

  Now, HB will stay in a loop until all random bytes are read and will
  print a message every 5 seconds that it is waiting for random data.

#655 - September 10, 2012 - beta expires December 15, 2012


* security is enhanced with a new encryption system
* key can be protected with a passphrase
* old backup data can be recrypted with a different key
* remote files are hardened against remote tampering
* multiple CPU cores are used in get and selftest
* raw block device, partition, and logical volume backups
* performance enhancements
* bug fixes


- NOTE: this rev will do an automatic database upgrade to dbrev 12
  when any HB command is used, to support the new encryption system

- security: HashBackup's encryption system is enhanced in this release
  to prevent a certain kind of information leak that could allow
  someone to determine if your backup contained data in common with
  their backup.  Existing backups remain accessible and future backups
  will use the new encryption system.

  NOTE: to exploit this, someone must have copies of your encrypted
  backup files and *unencrypted* copies of files to test in common.

- security: local and remote copies of dest.db are now encrypted.
  There is little value in hacking this file, which is why it wasn't
  encrypted before, but putting any data on a remote server
  unencrypted is not a good practice.

- security: remote copies of dest.db now have an HMAC (Hashed Message
  Authentication Code) signature.  This signature is generated on
  upload and verified on recovery.

  NOTE: an HMAC signature is similar to a regular hash, like SHA1,
  MD5, etc., but combined with a secret key.  Without knowing the
  key, the HMAC cannot be regenerated by an outsider, whereas other
  hashes can.  HMACs provide a stronger guarantee against tampering
  than regular hashes.

- security: previously, database files were encrypted with AES-128 and
  arc files used AES-256.  Now, AES-128 is used everywhere, because:

  -- this was advised after a security review of HashBackup by a
     well-known, published security expert: AES-256 has key schedule
     weaknesses that might make it less secure than AES-128.

  -- although all variants of AES have demonstrated weaknesses, no
     practical attack is known

  -- if you have data that is so valuable that AES-128 is not enough
     protection, it's more likely that the data will be obtained
     through other means such as stealing your computer and/or
     physical coercion rather than obtaining your backup and breaking
     the encryption

  -- a quote from the 2nd reference link below:
     "Recovering a key is no five minute job ... the number of steps
     required to crack AES-128 is an 8 followed by 37 zeroes.
     'To put this into perspective: on a trillion machines, that each
     could test a billion keys per second, it would take more than two
     billion years to recover an AES-128 key,' the Leuven University
     researcher added."

  -- References:

  The encryption change is implemented in a compatible way so that
  existing backups are still accessible.

- init: normally init generates a random key automatically, and stores
  the key in the key.conf file.  But in some situations, others may
  have access to the key.conf file.  Examples are hosted virtual
  private servers, managed servers where someone has root access, and
  Google Drive and other "remote drive" services.

  A new -p option has been added to init and rekey to protect the key
  with a passphrase.  hb init -p ask will get the passphrase from the
  keyboard for every hb command.  Even someone with access to key.conf
  cannot access your backup without also knowing this passphrase.

  The key.conf file has a new format to support passphrases.
  Old-style key.conf files are also accepted for compatibility.


  1. You still must make copies of your key.conf file!

  2. If you write your backup directly to a remote drive like Google
     Drive, the key will also be stored there.  To protect your
     backup, you MUST use a passphrase with the -p option to init.

- to increase security, pbkdf2 key stretching is used.  This may
  introduce a short delay (1 or 2 seconds) on every hb command.  Key
  stretching slows down an outsider's attempts to guess your key or
  passphrase by running both through thousands of one-way hashes.  Any
  attempt to guess a key / passphrase has to repeat all this hashing
  work for each guess.

  All of HashBackup's security comes from your key.  This is why hb
  init creates a random key by default: it is next to impossible for
  someone to guess a long random key.  Here are some suggestions for
  creating a strong passphrase to further protect your key:

  1. make up a sentence that you will remember and use this as your
     passphrase.  A sentence is easier to type than a password like
     Wjd0$p2^! and is stronger because it is longer.  Length wins over
     weird, hard to type, hard to remember passwords.
     Example: the fat green martian landed his shiny silver spacecraft

  2. make up a sentence that you will remember and use the first
     letter of each word as your passphrase.  For the first sentence
     in this paragraph, that would be: muastywrautfloewayp
  3. adding special symbols (other than spaces) will increase the
     passphrase strength.  One easy way to do this is to use special
     symbols before, after, and/or between words, for example:
     But make up your own special symbol rule.  Even adding just
     one special symbol will increase your passphrase strength.

  4. adding a number, especially in the middle, will increase your
     passphrase strength

  5. use a password manager program.  These store lists of passwords
     and passphrases in an encrypted file, protected by a master
     passphrase.  They often have password generators built in and you
     can cut and paste a passphrase when needed.

  6. To learn more about the importance and methods of choosing a good
     passphrase, do a search for:
     - strong password / passphrase
     - password / passphrase strength
     - password / passphrase entropy

- init: similar to -p ask, -p env can be used.  You first set the
  shell environment variable HBPASS to your passphrase.  For example,
  using the bash shell, you would say: export HBPASS='secret phrase',
  then run hb init -p env.  To make the environment variable
  permanent, export it in .profile in your home directory (for bash).
  Using -p env is less secure than -p ask, because every program you
  run has access to HBPASS.  But -p env is more convenient since you
  only have to set the environment variable once in your login
  session.  Setting HBPASS in .profile is probably less secure than
  typing the export HBPASS=mysecret command after you login.  If you
  do store your password in .profile, restrict file permissions to
        $ chmod ~/.profile 400
  The comments above about -p ask also apply to -p env.

- init/rekey: it is possible to use -p ask/env with -k ''.  This is
  less secure, because only the passphrase is used for encryption.
  But it is more convenient, because the key.conf file can be
  re-created more easily if it is lost.

- rekey: supports -p ask and -p env.  When using env variables for
  rekey, leave HBPASS set to the old passphrase before giving the rekey
  command; then update HBPASS with the new passphrase after rekey.

- rekey -k will now create a key.conf file if it is missing,
  however, no rekey occurs in this unusual situation.  This may be
  useful when recovering, to create a key file in an empty directory
  without using an editor program.

- rekey now recovers from interruptions.  If rekey is interrupted, no
  hb commands will work until the rekey is retried and completes.

- security: add a time delay if the key is incorrect.  User-generated
  keys and passphrases are not as strong as random keys and this delay
  may help slow down attempts to guess ("brute force") a key.  To
  prevent using the delay as an signal that the key is wrong, HB will
  also randomly delay even when the key is correct.

- backup: creates a new random backup key for every 64GB of backup
  data to avoid using the same backup key "too long".  Backup keys
  are managed by HashBackup and are not in the key.conf file.  In this
  release, a file bigger than 64GB will use only one key.

- recrypt: this is a new command that will re-encrypt backup data with
  the new encryption system.  Recrypt is different than rekey:

  -- rekey creates a new key in key.conf and re-encrypts hb.db; no
     archive files (these contain your backup data) are modified.
     This is used if the key.conf file may have been compromised.

  -- recrypt re-encrypts archive files containing backup data.
     Specifically, recrypt operates on archive files that were
     created before the last rekey command.

  To force re-encryption of all of your data, run rekey first, then
  run recrypt.


- recover: a public file signature (hash) is added to hb.db.n files
  sent to remote destinations.  This allows integrity checks on the
  remote side without the key.conf file, and it's also checked during

- recover: a private file signature (HMAC) is added to hb.db.n files.
  Recover verifies the HMAC to ensure that the file has not been
  changed during transmission or while it was on the remote.  Even
  before this release, tampering with encrypted hb.db.n files would
  have very likely caused a program fault during or after recovery;
  HMAC provides a stronger guarantee against undetected tampering.

- recover: a 2nd private file signature (HMAC) is added to hb.db.n
  files to ensure that the hb.db file created by recover is identical
  to the original.  This can detect more errors, for example, if an
  hb.db.n file is not applied, not applied correctly because of a
  software bug, or a valid hb.db.n file is copied over a different
  hb.db.n file on the remote side.

- get: multiple CPU cores may be utilized during a restore.  This is
  mainly beneficial for bzip2 compression, though restores with normal
  gzip compression are somewhat faster too.  A -p option is added to
  get, like backup; -p0 will use only 1 core.  The default is to use
  all cores.

- selftest: multiple CPU cores may be used, and the -p option was
  added.  The default is to use all cores.

- backup: when a fatal error occurred, backup would sometimes freeze
  after displaying the error message.  This has been improved, though
  may not be completely fixed.  It is a race condition while shutting
  down and hard to manage when an exception occurs.

- backup: the HB build number used to create each backup is now stored
  and displayed by the versions command.  It is not displayed for old

- selftest: renaming hard links a certain way could cause a selftest
  error, "for logid x, hlogid y is also hard-linked".  The backup data
  is actually okay - this was an bug in selftest.

- get: the previous hard link renaming scenario could cause restoring
  a renamed hard link to fail with a file size mismatch or a file hash

- mount: the previous hard link renaming scenario could cause access
  to a renamed hard link to fail with an error "Bad address".

- backup: hb can't yet handle ACLs on zfs, nfsv4, or Windows
  filesystems since their ACLs are not Posix compatible.  FreeBSD
  returns an error "Invalid argument" when these filesystems are
  backed up with hb.  Instead of printing this error on every file, hb
  will only print it once, and print a note that ACLs on this
  filesystem aren't supported.

- backup: on Linux, backing up sshfs filesystems caused a fatal error
  because sshfs returns the wrong error code when reading flags.

- selftest: could fail with an error on line 351:
    TypeError: not enough arguments for format string

- ls: display symlink targets with -l, like ls -l
- ls: display hard links with -lv

- get: if there was a problem reading a symlink target, get would
  abort; it should have printed an error and continued the restore

- raw block device / partition backup is supported.  Before, hb only
  saved the block device information you would see with ls -l.  Now, a
  block device pathname on the command line causes the contents of the
  block device to be saved.  For example:

      hb backup -c backupdir /dev/sda1

  would backup the first partition on disk /dev/sda.  You could also
  backup /dev/sda, which would save all partitions on the physical
  disk sda.

  Before doing a block device backup, make sure the device is not
  mounted.  hb does a basic check for this and won't backup anything
  that is displayed by df -l.  If you do backup a mounted block
  device, you'll likely have a corrupt device on restore.  Logical
  volumes can also be backed up using this method.  To get a clean
  backup without unmounting a LV, make a snapshot LV first and backup
  the snapshot rather than the actual LV.  To create a snapshot LV,
  there must be free space available on the same physical volume
  containing the LV (Linux).

  hb is not yet smart enough to backup only the used blocks in a
  partition.  It is safer (and easier!) to backup all blocks rather
  than reading the filesystem structures on the device to find used
  blocks, because hb doesn't need to know about filesystem details.
  But it can be slower if there is a lot of free space in the
  filesystem.  Image backups are very space-efficient if dedup is

  Dedup with a block size of -B4K works well with most filesystems,
  but the flipside is that this small block size does not compress
  very well.  Depending on the compessibility of your data and the
  amount of data changed, a larger block size like -B1M may work
  better.  Experiment with your actual data to decide which options
  are best.

  If you are backing up several block devices and want dedup to work
  across all of them, for example, each block device has a similar
  Linux VM, you will have to use -B4K, because even when different
  block devices contain the same data, the block placement is not

  hb get will also restore entire block devices.  The block device
  path must be given on the get command line for a full image restore.
  If --orig is used, the data is restored to the same device backed
  up.  To restore to a different device, use:

    hb get -c backupdir /dev/sda1 --todev /dev/sdb1

  The target block device must not be mounted.  If it is mounted, your
  restore will almost certainly trash any file system there, and the
  restore will also be bad because an active filesystem is writing to
  the same device.

  If neither --orig nor --todev are used, a file image is created from
  the backup as if you had done a dd from the device to a file named
  sda1 in the current directory.

  When restoring a block device with lots of free space, it may be
  faster to use -p0 to disable multi-core operation.

  On Mac OSX, the raw read block size is apparently fixed at 4K, so
  reading from a raw block device is slower than reading from the
  normal filesystem.

- backup: to avoid confusion, display a message if dedup is not
  enabled, and display dedup utilization statistics when it is
  enabled.  The statistics show whether -D would benefit from an
  increase in the dedup table size.

- get: when a file with special flags was restored, get would display
  an error: 'module' object has no attribute 'chflags' on BSD and OSX.

- selftest: could display an error message:
     Error: blockid n has version v
  where v is a version that was deleted.  The backup is fine and would
  restore correctly; this was a bug in selftest.

- backup: performance improved ~8% for small block sizes (VM images).

- ls: if -r was used to display a specific backup version and a
  directory was not backed up in that version but was backed up in an
  earlier version, ls would sometimes not display the directory at
  all.  Also, if a file in one backup was replaced with a directory by
  the same name in a later backup, ls -a would display the directory,
  but not the earlier file backup.  It should have displayed both.

#550 - April 9, 2012 - beta expires September 15, 2012

- backup: a new -Z option controls compression.  The default is -Z3,
  which is equivalent to the old behavior.  The possible values are:

    -Z0 = disable compression (replaces --no-compress option)
    -Z1-7 = gzip level 2-8
    -Z8 = bzip2 level 1
    -Z9 = bzip2 level 3*

  Dedup is not affected by the type or level of compression used, and
  different kinds/levels of compression can be itermixed in the same
  -c backup directory without problems.


  -- using -Z often triples backup time and doubles restore time,
     especially with bzip2.

  -- on a multi-core system, HB will use all cores with bzip2.  With
     gzip, HB will only use 2-4 cores by default.  You can raise this
     with -p but should probably experiment with your system first.

  -- disabling compression with -Z0 may be slower than -Z1 on files
     that are compressible, because more backup data is written; only
     use -Z0 if you are very sure that your files will not compress

  For even more control, there is an alternate -Z syntax:

    -Zgz   = use gzip at hb's recommended level
    -Zgz,n = use gzip at level n (1-9)
    -Zbz   = use bzip2 at hb's recommended level
    -Zbz,n = use bzip2 at level n (1-9)

  * In tests, higher bzip2 levels use more memory and take more time,
    but don't seem to improve compression ratios very much over level
    1; gzip level 9 takes longer but is rarely better than level 8.
    So don't use -Zbz,9 thinking you will get the best results; often
    it will just make your backup and restore take longer.
    Compression is always data dependent, so testing with your actual
    data is the only good way to select a compression type & level.

- the --no-compress option is obsolete and should be changed to -Z0.
  It will still be honored for a few months to give everyone time to
  change cron jobs, scripts, etc.  -Z has priority over --no-compress.

- backup: compression was often disabled when backing up files on
  single CPU systems or when -p0 was used.  This bug was just noticed
  but has been crawling around since #339.

- backup: if dedup is not initially used, but is used in later
  backups, a scan of all blocks was occurring to update the dedup
  table even though nothing would change.  This scan is now avoided.

#543 - April 5, 2012 - beta expires September 15, 2012

- compare: sometimes showed files as new even if excluded in inex.conf

#542 - April 3, 2012 - beta expires September 15, 2012

- get: permissions were set after a file was restored.  For large
  files that might take a while to restore, the permissions were lax
  during the restore.  Now the permissions bits are correct during the

- get: similar to above, directory permissions were set after a
  directory and all its contents were restored, and were too lax
  during the restore.  Unlike a file, directory permissions cannot be
  set correctly while the directory is being restored.  For example,
  if restoring a directory with r-x permission, it would not possible
  to restore the directory contents because w access is needed.  So
  during the restore, a directory's permissions will be set to rwx for
  the owner, none for others.  Then after the restore is complete, the
  correct permissions are set.

- during a sync operation, where archive files from a previous
  operation needed to be sent to remotes, files were not sent in

#540 - April 2, 2012 - beta expires September 15, 2012

- mount: reading files through an hb mount point was very slow for
  large files backed up with fixed blocks sizes.  Here are performance
  comparisons for a 10GB VM image saved with 4K blocks (smaller block
  sizes benefit more than larger block sizes, but both are faster):

  Read at offset 0:
    was: 258215424 bytes transferred in 30.039307 secs (8.595918 Mbytes/sec)
    new: 550768128 bytes transferred in 30.244884 secs (18.210291 Mbytes/sec)

  Read at offset 512M:
    was:  22285824 bytes transferred in 32.114154 secs (693.956 Kbytes/sec)
    now: 693599744 bytes transferred in 30.717192 secs (22.580181 Mbytes/sec)

  Read at offset 1G:
    was:   6557184 bytes transferred in 31.744395 secs (206.562 Kbytes/sec)
    now: 577826304 bytes transferred in 31.432984 secs (18.382801 Mbytes/sec)

#539 - March 31, 2012 - beta expires September 15, 2012

- backup: some future changes were committed by mistake in #537 and
  were backed out

#538 - March 31, 2012 - beta expires September 15, 2012

- mount: reading a file sequentially is twice as fast for files backed
  up with -D.  This is on top of the improvements in #537, where an
  n^2 algorithm was replaced by a nlogn algorithm.

#537 - March 31, 2012 - beta expires September 15, 2012

- backup: the config keywords no-dedup-ext, no-compress-ext, and
  no-backup-ext specify file extension of files that you don't want to
  dedup, compress, or backup.  The expected way of specifying these
  was: hb config -c backupdir no-dedup-ext 'jpg,jpeg'.  Matching of
  suffixes is case-independent.  The change in this release is that
  extensions can also be specified with spaces and/or with leading
  periods, so this is valid:
      hb config -c backupdir no-dedup-ext 'jpg jpeg, deb .iso'

- backup: if a backup is terminated with kill -9, it doesn't get a
  chance to clean up archive files; the next backup could print a
  negative number for the space used if the successful backup was
  smaller than the terminated backup

- hb would not upgrade rev 9 databases to rev 11; now it will

- security: hb would display a specific error message if a padding
  error was detected during decryption.  This can often be used in a
  "padding oracle" attack.  It doesn't exactly apply to hb because an
  attacker must be able to request decryption of chosen ciphertext,
  and hb will not perform decryption without a correct key file.  But
  as a precaution:
  -- hb will no longer distinguish between padding errors, decompression
     errors, and hash mismatches; all of these will cause a hash mismatch
  -- padding now uses random bytes

- if a directory destination (Type dir) in dest.conf didn't exist, it
  caused a confusing error message like:

    dest hb2: error in send arc.0.0: [Errno 2] No such file or directory: '/Users/jim/hbdir/arc.0.0.tmp'

  Now it will print a more direct error:

    dest hb2: Traceback (most recent call last):
      File "/", line 75, in loop
      File "/", line 22, in loopinit
    err: dir(hb2): directory doesn't exist: /Users/jim/hbdir

- if an error occurred while initializing a destination, the
  destination name was not always included in the error message

- mount: when a large file was backed up with -D, reading the file
  via an hb mount point would take a long time

#526 - March 1, 2012 - beta expires June 15, 2012

- backup: on BSD and OSX, hb indirectly used sysctl (a system command)
  to determine the number of CPU cores.  But sysctl may not be
  available to cron jobs with the default path, so PATH had to be
  changed in the crontab file.  A different method is used now that
  doesn't require setting PATH.
  NOTE: this change was supposed to go in release #512

- backup: display an error message rather than a traceback when
  non-integer values are used for keywords in dest.conf that are
  supposed to have integer values, such as a port number.

- get, ls: using -rX where X is a deleted version would cause a
  traceback.  Now it display an error that the version doesn't exist.

#520 - February 21, 2012 - beta expires June 15, 2012

- IMPORTANT NOTE: this release contains critical fixes to the recover
  feature.  Recover is needed when some or all of the local backup
  directory is lost, to recover data from a remote backup.  Everyone
  should apply this upgrade if using a remote backup (in dest.conf file)

- recover: sometimes an error could occur during recovery:
    Exception: Incremental db missing? 787521536 > 785825792: /hbtest/hb.db.163
  The backup data is all fine, but there was an incorrect test in the
  recovery code that is now fixed.

- recover: in an unusual situation where backups are running but a
  destination is unavailable (so no files are being transferred), then
  the destination becomes available and you recover from the
  destination before backup is able to sync the destination, it could
  cause the error:
    OSError: [Errno 2] No such file or directory: '/hbbackup/hb.db.229'
  This has been corrected.

- backup: socket pathnames caused a "Pathid xxx unused" error in
  selftest.  This is harmless and selftest would remove the unused
  paths, but the cause is now fixed

- selftest: in the previous release, selftest level 4 verified that
  every file block could be decrypted and uncompressed, and that the
  block and file checksums were correct.  For backups with a lot of
  dedup, VM backups for example, the same data block might need to be
  decrypted, uncompressed, and hashed many times to compute the file
  checksum.  Now, -v4 will only process each block once and will not
  verify whole file checksums.  -v5 will verify file-level checksums
  and is equivalent to -v4 in previous releases.  Running selftest
  without -v is the same as -v5, the highest level of checking.

- backup: HB is more efficient about storing hb.db.nnn files.  The
  first backup after installing this release may delete more hb.db.nnn
  files than usual.  Keep in mind that the hb.db.nnn file sequence
  numbers do not necessarily correspond to the backup version, ie,
  backup #5 may create hb.db.7.  This change will also make recovery
  times shorter.

#515 - February 7, 2012 - beta expires June 15, 2012

- backup: sometimes would report a databased locked error when
  synchronizing old backup files to a destination

#513 - December 10, 2011 - beta expires March 15, 2012

- extend beta expiration date to March 15

#512 - September 4, 2011 - beta expires December 15, 2011

- NOTE: this rev will do an automatic database upgrade to dbrev 11.
  Any previously backed up socket files are deleted during the
  upgrade, since these cannot be restored by hb get and generated
  errors during a restore.

- still working on the option of remote-only backups

- backup: backups on BSD failed with an error message: object has no
  attribute 'setbackup'

- get: in a large restore of thousands of files, an error could
  sometimes occur on a few files:

    Warning: partially restored file: (filename shown here)
    Exception: free variable 'cur' referenced before assignment in enclosing scope
    Continuing restore

- backup: on BSD and OSX, hb indirectly used sysctl (a system command)
  to determine the number of CPU cores.  But sysctl may not be
  available to cron jobs with the default path, so PATH had to be
  changed in the crontab file.  A different method is used now that
  doesn't require setting PATH.

- in #505, a 30-second timeout was added to rsync destinations.  On a
  very slow target (an unslung NSLUG2 NAS), rsync might repeatedly
  timeout when sending an updated archive after a retain or rm
  operation.  Now, the default timeout is 3600 seconds (1 hour), but
  it can be changed with the Timeout keyword in dest.conf for rsync

- ssh destinations sometimes failed with authentication errors when
  OSX or BSD tried to connect to a remote ssh server running CentOS
  5.5.  To improve compatibility, hb now uses the system sftp program
  instead of connecting directly to the remote ssh server.

- socket files were backed up in previous versions of hb, but these
  can never be restored and hb get would generate an error when it
  tried to restore sockets.  Sockets are no longer backed up.

#510 - May 22, 2011 - beta expires September 15, 2011

- development of the "remote only" backup option is not quite ready,
  so this very minor update is being issued to extend the beta
  expiration date to September

- a few doc files were updated

- backup -c <dir> to an empty directory (no hb init) creates hb.lock,
  but then hb init refused to run because the directory was not empty.
  Now, hb.lock is ignored by init

#505 - Apr 13, 2011 - beta expires June 15, 2011

- NOTE: this rev will do an automatic database upgrade to dbrev 10.
  Some items were moved from archives to the main database, so many
  archives may have items removed and your hb.db file may grow

- IMPORTANT USAGE NOTE: if a new destination is added to an existing
  backup, the new destination is not fully synchronized until after
  the next successful backup.  This has not changed, just making
  everyone aware

- COMPATIBILITY NOTE: in previous versions, the hb executable was
  copied to all remote destinations if it changed.  Now, the hb
  executable is only copied if the config variable copy-executable is
  True.  The default in this version is False, which is a change in
  behavior.  To re-enable this, use: hb config copy-executable True

- Google Storage for Developers, an Amazon S3-like service currently
  in beta and available by Google invitation only, is now supported.
  The destination type is gs, the dest.conf config variables are
  accesskey, secretkey, and bucket, as with S3 destinations, and the
  environment vars are GS_ACCESS_KEY_ID and GS_SECRET_ACCESS_KEY;
  environment vars are only used if accesskey and secretkey are not
  specified in dest.conf. For more information, see

  NOTE: Google Storage for Developers is not the same as Google
  Storage for Docs.  HashBackup does not yet support using Google
  Storage for Docs as backup space.

- S3-compatible services are supported with the new Host and Port
  config variables on an S3 destination.  This can be used with
  Eucalyptus' Walrus S3-like service for example.  Eucalyptus / Walrus
  is an open source S3 clone that provides an S3-like service.  For
  more information see

  NOTE: the Host and Port keywords are not necessary with destination
  types s3 or gs, and will default to the correct Amazon or Google
  values.  For other S3-like services, Host and/or Port are required.

- Rackspace Cloud Files storage is supported in this version.  The
  destination type is cf, Userid and Accesskey keywords are required,
  and Container is where you want your backup stored.  Unlike S3 and
  Google Storage, Cloud Files containers are per-userid, so there is
  no need to find a globally unique name.  Rackspace charges for
  incoming bandwidth are about half of Amazon S3 and Google Storage.
  For more information see

- rsync destinations will timeout after 30 seconds if the remote is
  unresponsive.  Before, it took a very long time for an rsync
  destination to timeout

- if a local archive file is missing during a get (restore), hb
  automatically downloads it as needed; this is not new.  Destinations
  in dest.conf should be listed fastest download speed first, and the
  fastest destination with the most up-to-date version of a file is
  the one that will be selected for downloading.  Before this release,
  the destination chosen was unpredictable.  This does not apply to
  recover (get all files) since recover uses only the destination you

- if a destination had a failure, hb would display an error and
  process the next request for that destination.  Now hb will try
  requests up to 3 times, and if they all fail, the destination is
  stopped and no more requests are sent to it.  hb will "catch up" the
  destination on the next backup.

- when multiple destinations were configured and an error occurred on
  one destination, the other destinations continued to work correctly;
  but once the working destinations were finished, a backup would
  sometimes "hang" waiting for the failed destination to finish (it
  never would finish since it failed)

- selftest -v3 sometimes displayed the backup size larger than the
  actual size.

- selftest: data structures created during selftest are more

- when backup is waiting for destinations to finish, it prints a list
  of destinations that are still busy with transfers, and updates this
  list as destinations finish their work

- display a message when the hb executable program is being copied to
  destinations; this can sometimes take a while, and it isn't always
  obvious why there is a longish delay

#487 - Mar 27 2011 - beta expires June 15, 2011

- S3 destinations could fail with a traceback like:
  File "/", line 38, in __init__
  File "/", line 26, in baseinit
TypeError: string indices must be integers, not str

#486 - Feb 20, 2011 - beta expires June 15, 2011

- in some situations, the db upgrade procedure of #485 could cause
  hb.db to be removed by mistake

#485 - Feb 15, 2011 - beta expires June 15, 2011

- NOTE: this rev will do an automatic database upgrade to dbrev 9

- the major new feature in this release is incremental transmission of
  hb.db, the main HashBackup(TM) database.  The major new feature of
  the next release is making optional the local copy of the backup.

- COMPATIBILITY NOTE (backup): the inex.conf file previously allowed
  excluding files from the backup and overriding those excludes with
  includes.  There were several bugs and points of confusion, so this
  was simplified by eliminating the include keyword: now, files can
  only be excluded.  To include a file that would be excluded by an
  inex.conf rule, list the pathname on the backup command line.
  Include processing may be added again later, depending on user
  feedback.  As a side benefit, the incremental backup file scan to
  find modified files is 10% faster.

- COMPATIBILITY NOTE (backup): in previous releases, data from a prior
  backup was often used for dedup even if the -D option wasn't
  specified.  Now, the -D option is required to enable dedup.  Backup
  data created *without* -D is not used to dedup future backups, in
  most cases.

- config: a new config parameter, no-backup-ext, has been added.  This
  is a list of filename extensions that should not be backed up.  For
  example, you might use:
       hb config -c /hb no-backup-ext avi,mov,o
  to skip backup of files ending with .o, .avi, and .mov.  This is
  faster than using exclude patterns like ex *.o in inex.conf.

- config: a new config parameter, dedup-mem, sets the default amount
  of memory to use for dedup operations.  This can be overridden by
  backup's -D option.  The default is zero, for no dedup.  In this
  release it's also possible to have dedup-mem set to some value you
  usually use, like 1gb, and use the -D0 backup command line option to
  disable dedup just for that backup.  See doc/ for more
  information about the amount of memory to use for dedup.

- a new command, hb init, is required to initialize the -c backup
  directory before the first backup.  This allows modifying exclude
  rules before the first backup, and allows setting the encryption key
  to a user-specified value using the -k option rather than hb
  choosing a random key.

  SECURITY NOTE: for higher security, it is recommended that you let
  hb init choose a random key string, as with previous releases.  Or,
  you can choose a long phrase that is easy to remember for your key,
  for example: my cat chases my dog.  For less security, you can use
  -k '', which specifies encryption with a blank key.  With a blank
  key, there's no need to store the key securely.  Spaces are removed
  from the key, so key abc def is the same as abcdef.

- a new command, hb rekey, can be used to change the database
  encryption key.  Usage is similar to hb init.  After the database is
  rekeyed, it is transmitted to any remote destinations you may have
  setup in dest.conf.  If you want some security but don't want to
  fiddle with storing encryption keys separately, you could rekey to
  an easily remembered phrase like 'my dogs name is spot'.  For less
  security, you could rekey to the blank key ''.

- backup: the Freq keyword is no longer supported.  This was used to
  defer transmits offsite, for instance, once per week.  The only time
  or bandwidth savings was for transferring the hb.db file, and this
  is now sped up in a different way.

- sending incremental backups offsite is much faster, especially if
  not many files were modified.  For rsync destinations, the first
  backup after the upgrade will be much slower than usual, while other
  methods (ftp, etc) will be about the same.  After that, it will be
  faster than usual for all destination types.

- clear: files stored on destinations are also removed; a new warning
  about this is displayed unless --force is used

- mount: when accessing the mounted HB filesystem, data near the end
  of a file would sometimes not be returned correctly.  The backup
  itself was fine - only accessing it via mount was affected.

- backup: dir destinations were creating the target directory, even
  though dest.conf.example said the directory had to exist

- backup: if a pathname on the command line is a symbolic link, backup
  saves both the symlink and its target.  For example, on BSD systems,
  /home is a symlink to /usr/home, so a backup of /home saves both the
  /home symlink and the complete /usr/home tree.  The new behavior in
  this release is that if a backup pathname *contains* a symlink, the
  pathname is resolved and the target pathname is used instead.  So
  for example, if /home/jim is saved, a message is displayed that the
  pathname was changed to /usr/home/jim, and this tree is saved.  No
  /home/jim pathnames will appear in the backup, since they don't
  actually exist in the filesystem.

- ls would not display the contents beneath a symlink to a directory.
  For example, on BSD systems, /home is a symlink to /usr/home.  If
  /home/jim was given to the backup program, all of /home/jim would be
  saved, but ls would only print / and /home.  This situation cannot
  occur going forward, because in this release, the backup program
  resolves the symlink in the pathname (see previous note).

- ls displays "(parent, partial)" when a directory is backed up
  because it is the parent of a file that was requested.  For example,
  if /Users/jim/x is backed up, /, /Users, and /Users/jim are also
  listed in the backup, all marked "(parent, partial)"

- get: if a file or directory being restored already exists, get
  prints a warning.  If the user interrupted the restore, deleted or
  renamed the existing file, and then continued the restore, get would
  fail when trying to delete the old object

- backup: sometimes a hard-linked file would be saved on every backup

- backup: create hash.db without execute permission bits

- rm & retain: if an archive was removed while some destinations were
  offline or inaccessible, it was not removed later when the
  destinations were accessible

- rm & retain: if one file such as /Users/jim/x is backed up and then
  removed, a small archive was left in the backup directory if there
  were extended attributes or ACLs on any of the parent directories

- error messages were sometimes being sent to stdout instead of stderr,
  which can be an issue for scripting hb

#426 - Dec 9, 2010 - beta expires March 15, 2011

- ls: failed when a file or pathname was listed on the command line.
  This bug first occurred in #408.

- versions: with no options, display the most recent 5 backups vs 1

- in #416, the upgrade command was changed to set the owner of the hb
  executable as it was before the upgrade.  But a typo caused the error:
     AttributeError: 'module' object has no attribute 'state'
  After this error, there will be an hb.tmp file in the same directory
  as the old hb executable.  To finish the upgrade, do:  mv hb.tmp hb

#416 - Dec 5, 2010 - beta expires March 15, 2011

- backup: if a single zero-length file was saved, the empty archive
  file created should have been deleted

- delay transmitting archives after a retain or remove if they don't
  shrink very much.  This was unintentionally removed at #408

#410 - Nov 29, 2010 - beta expires March 15, 2011

- in release #408, the help command didn't work if the fuse library
  was missing.  Now, only the mount command will fail when fuse isn't

#408 - Nov 26, 2010 - beta expires March 15, 2011

- COMPATIBILITY NOTE: the undocumented --no-dupcheck option to backup
  has been removed, as no dedup is now the default and has been for
  several releases.  To enable dedup, use the -D option, for example,
  -D1g will dedup using 1GB of memory.

- this version of HashBackup has a new archive format that uses up to
  9% less space to backup the same amount of data from VM images, and
  is also faster to create and access.  Old format archives are still
  supported and are converted to the new format when they are
  accessed.  You can leave old format archives on remotes, and when
  recovered, hb will convert them.  If you want to force all local
  archives to be converted immediately, use hb selftest -v3 after
  installing this new version.  The converted archives will not be
  uploaded to destinations unless they shrink much smaller than the
  remote archive.  If you want to force all converted archives to be
  uploaded, run selftest as described, then delete dest.db from the
  backup directory.  All archives will be uploaded during your next

  NOTE: during conversion, new format archives are created in a temp
  file and then replace the original.  Because it would double your
  backup space requirements, no copies of the original archives are
  kept.  If you want a backup of your old archive files, copy them
  before running this version of hb.

- a new command "config" will display HashBackup's config settings.
  These can also be changed with the config command to modify the
  operation of hb.  In this release there are several config settings:
  arc-size-limit: this controls how large individual archive files can
  grow before a new archive file is started.  The default is 1gb, as
  before.  The minimum value is 1mb and the maximum value is 8gb.  Be
  careful not to set the size larger than your remote destinations can
  handle, for example, Amazon S3 is limited to 5gb file sizes, so the
  archive size limit should probably be 4gb at most.  GMail has a size
  limit of 25mb, so a limit of 20mb should be used.

  no-dedup-ext: files with a suffix listed here will not be deduped.
  Any suffixes listed here will add to the built-in list that backup
  uses.  Suffixes should be listed with commas but without periods,
  for example: hb config no-dedup-ext bz2,bz

  no-compress-ext: files with a suffix listed here will not be
  compressed.  For example: hb config no-compress-ext avi,mp3

  Config settings are versioned and match the corresponding backup.
  This allows you to see the config settings used for any prior
  backup, and to revert to the config settigs of a prior backup.

- a new command "compare" will compare a filesystem path to a backup
  and display any differences.  Compare only supports comparison to
  the current backup version, though in the future it may be extended
  to compare to older backups ("What has changed since -r10?")

- the built-in help has been cleaned up

- if recover -a is used to fetch files from remote destinations, and
  .old files exist in the local backup directory, recover would warn
  that archive files exist and would be renamed, even if they didn't
  really exist; the .old files were confusing recover.  This could
  happen if recover was executed twice for example.  Everything
  worked, just the warning was sometimes incorrect.

- when using a GMail account to store backups, hb.dbz and dest.dbz
  would keep appending to their email conversation, instead of
  replacing the old files.  HB was never designed to work this way,
  and didn't work this way with other email providers.  Now, there
  will only be 1 hb.dbz and 1 dest.dbz, to save space in the GMail
  account.  Be sure to set arc-size-limit (see above) when using an
  email account to store backups.

- for future upgrades, hb upgrade will display only the changes since
  the version of hb that is already installed.  So for example, if you
  are at #406, miss the upgrade to #407, and do the upgrade to #408,
  it will display only the changes in #407 and #408 - not the whole
  change log.

- with release #339, the upgrade command sometimes had trouble finding
  and replacing the hb executable.  As a workaround, use:
     /path/to/hb upgrade

  to upgrade the executable, ie, a full pathname.  This was fixed in
  #346, and upgrade should work fine with that release.

#346 - Nov 3, 2010 - beta expires March 15, 2011

- the October build was created on newer versions of Linux and BSD 8.
  Unfortunately, on Linux it required glibc 2.7, but CentOS, RHEL, and
  other versions of Linux do not have this newer C library.  This
  version of HashBackup was built on CentOS 5.5 and should run on both
  older and newer versions of Linux.  The BSD version was built on BSD
  7, and should run on both BSD 7 and 8.

- new command "upgrade" upgrades your hb executable to the latest
  version.  This should be run from a userid with sufficient
  permission to replace the hb command.  For example, if hb is in
  /usr/local/bin, you may need to run hb upgrade as root.  The old hb
  executable will be renamed to hb.bak after a successful upgrade.

- backup: on Mac/OSX, files would sometimes be backed up that hadn't
  changed.  OSX changes ctime on files, and this is what hb used to
  detect changes.  This is fixed with a more detailed test using both
  ctime and mtime.

- backup: related to the previous issue, some sites may not want to
  trust mtime, because it can be set by user programs.  For example, a
  file's data can be changed and then mtime reset to its previous
  value; this makes it appear that the file data hasn't been changed.
  It's usually fine to trust mtime, but if your site does not, use the
  --no-mtime backup option and hb will compare the entire file to the
  previous backup when mtime is the same, to ensure the file data is
  also still the same.  --no-mtime will make your backup run somewhat

#339 - Oct 9, 2010 - beta expires December 15, 2010

- NOTE: this rev will do an automatic database upgrade to dbrev 8

- backup: a new variable block dedup method allows HashBackup to dedup
  more files, such as:
  - office document edits
  - tagged mp3, m4a (iTunes) files
  - email attachments
  - database dumps
  - uncompressed tar files
  - gzip --rsyncable files

  Variable block dedup is enabled when -D is used on multi-core
  systems, for example, -D1g to use 1gb of memory for dedup.

  Variable block dedup is not used:
  - for files smaller than 128K
  - when -p0 is used to disable multicore backup
  - on single-core systems

- backup: with rev #321, an error message:
      OperationalError: database arc is locked
  was sometimes displayed on incremental backups larger than 1GB.

- add the Port option to ftp destinations, userid and password are
  optional for anonymous ftp access

- a new hb command sha256 computes the sha256 hash for a file.
  This is handy if your OS doesn't have a built-in sha256 command.

- ls accepts a new -v option.  When combined with -l (long display),
  -v also displays the inode, ctime, and sha256 hash

- backup: on Mac/OSX, if backing up / and the current directory was
  not /, an incorrect error message was displayed, like:
     Pathname changed: / => /Users/jim/
  This bug was introduced in r288

- backup: if a single zero-length file was saved in version 0, or all
  backup data was removed with hb rm /, the next backup would
  immediately fail with an error

#321 - Sep 20, 2010 - beta expires December 15, 2010

- exclude /home/*/.gvfs for Linux

- backup: -B16k displayed an error message

- backup: fixed bug when writing to remote destinations:
    ImportError: No module named dump

- rm: the current version was always printed with the "you are not the
  owner" error message; now the correct version is displayed

#311 - Sep 3, 2010 - beta expires December 15, 2010

- NOTE: this rev will do a minor automatic database upgrade to dbrev
  7.  If you use -D, your first backup after this upgrade may be
  slower to get started while it rebuilds the dedup database.

- a 64-bit build of hb is now available for Linux and BSD; the Intel
  OSX build is already 64-bit.  The advantage is that 64-bit builds
  allow dedup (-D) tables larger than 2GB, and 32-bit compatibile
  system libraries don't have to be installed.

- backups with dedup (-D) are a little faster because of less I/O

- improved error handling for multi-core backups

- a new destination type, sftp, is added.  This is equivalent to the
  old ssh destination type, except now either will work with ssh
  servers that allow sftp access but may not allow terminal sessions

- get: redundant pathnames are an error, not a warning.  Better safe
  than sorry

- if HB was waiting for a yes/no question to be answered, ^z was used
  to suspend the program, then fg was used to start it again, an error
  about an interrupted system call would be displayed.  Now, the
  question will be repeated.

#293 - August 12, 2010 - beta expires December 15, 2010

- NOTE: this rev will do an automatic database upgrade to dbrev 6.
  The hb.db file may shrink up to 50% after this upgrade, because some
  data is moved into a separate database, hash.db.  The advantage is
  that when hb.db is sent offsite, it may be much smaller; hash.db is
  not sent offsite and is generated from hb.db when necessary

- backup: archive files were sometimes being transmitted twice

- backup: a new option, -p, uses multiple CPU cores to backup large
  (>128K) files.  This can speed up the backup of large files by 30%
  or more.  If the -p option is not used, HB will automatically use
  multiple cores when run on a multi-core system.  The -p option
  allows finer control of this feature:

    -p0 = do not use multiple CPU cores for backing up
    -pn = use n extra tasks
    no -p = use 2 extra backup processes on a multi-core computer

  You might be tempted to use -p8 on an 8-core system, but it could
  actually make your backup slower.  If you have very fast disks, you
  may want to try increasing -p to 3 or 4.

- backup: a new option, -D, controls data dedup.  This version of
  HashBackup uses a new dedup method that is much more scalable.  The
  value after -D is the amount of memory you want to use for dedup
  information.  The -D option may be tweaked in the next few releases.
  For detailed information, see doc/

  NOTE: in previous releases, dedup was enabled by default; now, it is
  disabled by default, because it requires some thought about the
  amount of memory to use for dedup.  HashBackup may still do some
  dedup on incremental backups, even if the -D option isn't used,
  especially with VM images, logs, mailboxes, and databases.

- backup: a new option, -B, controls the block size.  This can be 1K,
  2K, 4K, 8K, 16K, 32K, 64K (default), 128K, 256K, 512K, 1M, 2M, or
  4M.  Tests show that a large block size doesn't usually speed up
  I/O, but it allows HB to scale to larger backups when very large
  files are being saved.  A trade-off is that the dedup mechanism may
  not find as much duplicate data with larger block sizes.  With small
  block sizes, dedup is better, but overhead is higher and the backup
  may be slower.  The default block size is either 64K or 4K bytes,
  depending on the file type.

- backup: print backup directory space used just for this backup and
  in total

- backup: print compression statistics for this backup as both a
  percentage and compression factor

- backup: a new option, -m/--max-file-size n, will skip files larger
  than n bytes

- backup: display an error message if a directory destination is the
  backup directory.  This copied the backup onto itself and caused
  warning messages like:
    dir(destname): file changed during transfer: /hb/arc.0.77
  The backup remained intact, but there would be a lot of unnecessary
  disk I/O.

- backup: if -c backup target is a FAT filesystem, such as most flash
  drives, the initial backup would fail with an error message like:
    OSError: [Errno 1] Operation not permitted: '/mnt/hb/key.conf'
  HashBackup was trying to make the key file read-only, but this is not
  possible on a FAT filesystem.  Trying the backup again would succeed.

- backup: on Mac/OSX, strip trailing slashes on command line pathnames
  to prevent errors like:
    Pathname changed: /Users/jim/backup/rest => /Users/jim/backup/rest/
    Unable to stat file: No such file or directory: /Users/jim/backup/rest/

- get: if the same pathname was listed on the command line twice, or
  redundant pathnames such as /abc and /abc/def were used (because
  restoring /abc would also restore /abc/def), get would fail with an
  error on OSX like:

    Unable to hardlink: Operation not permitted: /Users/jim/hb-551625.tmp -> /Users/jim/xxx
    Not restored: /Users/jim/xxx

  On Linux, get would fail with an error about using bind --mount.
  The get command incorrectly believed the directories should have
  been hardlinked.  Now, a warning is printed about skipping the
  redundant pathname.

#256 - June 22, 2010 - beta expires September 15

- the correct README file is included

#255 - May 24, 2010 - beta expires August 15

- backup: copy hb program to local backup area as executable (755 mode
  instead of 644)

- get: when a directory specified on the command already exists, hb
  would correctly do the restore into a temp directory; but when it
  tried to remove the old existing directory, the remove could fail
  with a "Not a directory" error if the old directory contained a
  symbolic link to a directory.  ("directory" 6x - Ugh!)

- get: don't complain about existing symbolic links if they point to
  the correct destination.  This doesn't change hb's restore behavior,
  but avoids unnecessary error message displays.

- get: setuid and setgid mode bits were not being restored (these are
  for privileged commands like "mount"; see chmod command)

- get: as a security precaution, setuid, setgid, and the sticky bit
  are only restored when running as root.  Otherwise, a warning is
  printed for files with these bits set; see chmod command.

- get: if an error occurred during restore of one path, get would
  (incorrectly) say that there were errors in all subsequent paths,
  even though the paths were restored without errors

#254 - May 22, 2010 - beta expires August 15

- recover: non-rsync destinations could fail with the error:
  object has no attribute 'cursor'

- recover: could display the warning message:
  Unable to set mtime on hb.db: Cannot operate on a closed database.

- recover: now displays a summary warning if any files were not

- get: refused to restore / to a non-empty directory.  But this is
  sometimes necessary for a system rescue, so now this error is a
  warning.  As always, be careful when restoring files from a backup!

- get: if / was restored to some directory (not /), without using
  --orig, the pathname displayed for restored files was incorrect and
  symbolic and hard linking did not work correctly.  This situation
  typically occurs when booting from a CD to restore a crashed root
  filesystem that is temporarily mounted under /mnt.

#252 - May 11, 2010 - beta expires August 15

- NOTE: this rev will do an automatic database upgrade to dbrev 5

- recover: there was a bug in the new feature to remove unreferenced
  blocks from downloaded archives.  If you ever ran recover with
  version #249, run selftest -v3 to verify your backup's integrity.
  IMPORTANT NOTE: this is a critical bug in #249 and everyone is urged
  to upgrade

- when this build is installed, archives are scanned to remove any
  stale file data.  This is necessary because of a bug in version #249
  retain (see next item).  This scan will take some time for large
  backups.  If you have to interrupt it, it will restart the next time
  you use hb; ie, you won't trash your backup if it's interrupted.
  The hb.db file itself is not modified, other than to change the rev.
  The archive scan is removing orphaned data blocks and possibly
  compressing archives.  If any archives are sufficiently compressed,
  they will be transmitted on the next backup to your remote

- retain: sometimes the message:
    Unable to remove block xxxx: no such table: rmlist
  was displayed.  Retain still removed older versions of files, but
  the file data itself wasn't being removed from the archive files.
  The effect of this bug is that archives do not shrink when they
  should, but all backup data is intact.

- HashBackup's Amazon S3 destination now accepts any value for the
  Location keyword (S3 Region), without validating it.  The values and
  their meanings as of May 5, 2010 are:
  -- no Location, US, or blank location = US Standard.  Data will be
  stored on the east coast or west coast, whichever is closest
  -- us-west-1 = US west coast.  It costs more to store data using
  this name vs using the more generic US region
  -- EU = Ireland
  -- ap-southeast-1 = Singapore

- rm/retain: transmitting an older archive over rsync after a rm or
  retain uses less CPU time than with version #249

- selftest: if an error occurred with -v0 (just read the database
  file), the displayed error count should have been 1 but was
  actually a library error code value, like 256.

- selftest: added a new -v1 verify level that does not traverse each
  file's block info since this takes time for backups with very large
  files such as VM images.  -v2 is now like the old -v1:
    -v0: read each page of main database, like cat hb.db
    -v1: check database, don't traverse file blocks, don't read archives
    -v2: check database, traverse file blocks, don't read archives
    -v3: v2 + read all archive blocks and verify crc
    -v4: v3 + decrypt and decompress all data, verify block hashes,
     verify file hashes.  Like a restore, without writing to disk.
    -v9: v4 + low-level database integrity check
  As before, the default verify level is -v9.

- selftest: verify levels -v1 and -v2 (the new one) are a bit faster
  if the database isn't already cached in memory

- selftest: -v3 (old -v2) would sometimes display X GB verified, where
  X was much bigger than the entire set of backup data.  Related to
  this, -v3 (old -v2) may run faster, depending on your backup data

#249 - March 6, 2010 - beta expires June 15

- recover: remove unreferenced blocks from downloaded archives

- add /.hotfiles.btree to inex.conf for Mac/OSX

- if arc files exist on the local system and a recover -f command
  is issued, the existing arc files are renamed with a .old suffix.
  These .old files are now ignored during archive synchronization.

- backup: if a pathname ending in . was used on the backup command
  line, /Users/jim/backup/. for example, a message like:
     Pathname changed: /Users/jim/backup => /Users/jim/backup/.
  was displayed, and the stored pathnames also contained .

- rm/retain: with hundreds of archive files, rm and retain could run
  out of file descriptors when a large number of files were being
  removed, especially on systems like OSX where the number of open
  files is limited to 256 by default.  rm and retain now use just a
  few file descriptors.

- backup: exclude /private/tmp/ on OSX (/tmp symlinks here).  To
  update an existing inex.conf, add ex /private/tmp/

#246 - January 4, 2010 - beta expires March 15

- INCOMPATIBILITY NOTE: the -n option (dry run) has been removed from
  the retain command, but the longer forms --dryrun and --dry-run are
  still available.  This is in preparation for -n to mean "don't
  transmit the database", as with the backup command, to allow retain
  to run more than once before transmitting hb.db

- NOTE: this rev will do an automatic database upgrade to dbrev 4

- beginning with this release, HashBackup releases are identified by
  the build number rather than a version number

- the Linux build of 0.9.10 failed: ImportError: No module named acl

- in some cases, the database upgrade in 0.9.10 could fail with:
      TypeError: 'NoneType' object is unsubscriptable

- if the backup command did the database upgrade (vs any another hb
  command), it would take 10x longer than it should have, because IO
  buffering was disabled.  On a Fedora test machine, the upgrade took
  20 minutes with the backup command, but only 2 minutes with this fix

- if the backup command did the database upgrade, it would then fail
  with an error like "not a database or encrypted: arc.x.x-journal".
  The next backup command would work.  This was a bug in the archive
  synchronization procedure

- the change in 0.9.9 to use multiple CPUs to prepare the database
  wasn't actually enabled in previous beta builds

- VMWare memory images, *.vmem, are now excluded when a new inex.conf
  is created.  For existing inex.conf files, add an ex *.vmem line

- ls: was looping when a wildcard filename like '*.vmem' was used

- selftest: added multiple levels of selftest, taking increasing time:
    -v0: read each page of main database, like cat hb.db
    -v1: database consistency; no archive files are read
    -v2: v1 + all archive blocks are read and crc verified
    -v3: v2 + all data decrypted and decompressed, block hash verified,
     file hash verified.  Like a restore, without writing to disk.
    -v9: v3 + low-level database integrity check
  The default level is -v9.  This checks everything possible, as in
  earlier versions of selftest.  On my MacBook, with 45GB backed up,
  using 22GB of backup space, the verify times are:
    -v1:  4 minutes
    -v2: 24 minutes
    -v3: 71 minutes
    -v9: 75 minutes
  For comparison, it takes 10 minutes just to read the entire 22GB of
  backup data from disk at the maximum speed of 35 MB/sec with the
  command: time cat /hb/* >/dev/null

- selftest: delete unused paths with -v1 or greater; this is normally
  not necessary, but unused paths may occur in some circumstances

- rsync destination: the Dir keyword is checked to make sure it has
  the proper format, specifically, that it contains a : or ::

- rsync destination: HB was always adding /filename to the end of the
  Dir keyword to form the target path, but if the Dir path ends in :
  then a slash should not be added

- rsync destination: added debug keyword.  If value is 1 or more, the
  rsync command line is printed and -v is added so that rsync is more
  verbose.  If debug is 2, -vv is added (even more verbose), etc.

- rsync destination: improved transfer efficiency, esp for rm and retain

- S3 destination: added debug keyword.  With a value 1, data being sent
  to and received from Amazon S3 is displayed.  With a value 99, any
  exception during a file transfer will cause a traceback and HB will
  hang (use Ctrl C to terminate it).

- S3 destination: added a DNS lookup during startup to display a
  better error message when a system's DNS is not configured correctly

- backup: version 0.9.10 would fail on very long (>1023 bytes) symlinks,
  ACLs, and extended attributes with the error message:
      TypeError: an integer is required

- retain: added directory retention.  Previously, retain only removed
  files, which could leave empty directories in the database

- made "hb restore" an alias for "hb get"

- rm and retain now overlap archive compression, archive transmission,
  and database compression

- better cleanup of archive journal files

- rsync destination: workaround rsync bug: rsync 3.0.4 client (PCBSD
  7.1.1) with rsync 2.6.9 server (Mac OSX) gives "unknown option"

0.9.10 - November 21, 2009

- this version has a database format change and will automatically
  upgrade your backup database the first time hb is used.  All backup
  data is maintained, except extended attributes (SELinux); they will
  be saved again on the next backup

- ACLs are supported on Mac (OSX) and BSD systems (Linux ACLs were
  already supported)
  NOTE: OSX 10.5 (Leopard), FreeBSD 7.1, and PCBSD 7.1 have an
  operating system bug that causes a small memory leak for every ACL
  restored.  A patch to fix this was committed in the FreeBSD tree

- the mount command (FUSE) is available on FreeBSD/PCBSD

- mount: reading files from a mounted backup (FUSE) was sometimes
  extremely slow and CPU intensive because of a bad database query

- on OSX, filenames are case-insensitive; but if /users/jim is used
  on the backup command line, it must still be saved as /Users/jim,
  and HB will print a notice:
  Pathname changed: /users/jim/backup/x => /Users/jim/backup/x
  NOTE: HB exclude/include processing is always case sensitive

- on BSD & OSX, file flags are saved/restored like Linux version (see
  man chflags)

- on BSD & OSX, extended attributes on symbolic links are now saved
  and restored.  (Linux symlinks cannot have extended attributes)

- get: on BSD & OSX, symbolic links with a different mode than their
  link target now have the correct mode after a restore

- queue database to transmit next when the backup is finished.  If
  transmitting all archive files takes a long time (days or weeks for
  a huge backup), there will be a database saved on the remote side to
  restore the archives that did finish transmitting

- if the backup database doesn't exist but the compressed database
  does, HB will ask if you want to expand the compressed database.
  This is useful to run selftest directly on a destination directory,
  for example, an external USB hard drive.  Or, if the disk area
  storing the database itself goes bad (very unlikely, but possible),
  the compressed DB file can be expanded and used instead

- recover: fix 1% failure with index out of range error

- recover: print numbers so it's clear that recover isn't stuck

- a problem restoring a symbolic link or extended attributes could
  cause the get command to abort.  Now it will print an error message
  and continue the restore

- some HashBackup data files had x (execute) permissions

- a directory could be saved without its extended attributes (SELinux)

- destination handlers were sending error messages to stdout vs stderr

- when key.conf is created, a second line is written with spaces
  every 4 hex digits to make the key easier to copy by hand

- backup: if a pathname requested for backup is a symbolic link, for
  example, /home points to /usr/home on FreeBSD/PCBSD, the symbolic
  link's target (/usr/home) is added to the backup with a notice:
    Adding symlink target to backup: /home -> /usr/home
  This prevents the serious mistake of believing the files "in" /home
  are being backed up when in reality, only a symbolic link to
  /usr/home would be backed up.
  IMPORTANT: "symlink following" only occurs for command line paths!

- the rsync destination now accepts a port keyword, to allow the rsync
  daemon to run on a port other than the standard port 873.  This only
  works with rsync modules, ie, two colons used in the dir path.

- cache sizes have been scaled back in this version; determining the
  optimum cache size needs further study

0.9.9 - October 27, 2009

- add Password keyword to rsync destinations, to set the rsync module
  password when using the two colon form of rsync (direct to rsyncd)

- expanded documentation for rsync destination in dest.conf.example

- preparing the database for transmission is 35% faster on multi-core

- ls: added a note, noaccess, when directory contents can't be shown
  because of insufficient permissions

- ls: improved performance 10% when listing specific files

- Amazon S3 uploads would sometimes fail with "bad marshal data" or
  "No parsers found", depending on the system's configuration

- Amazon S3 uploads would sometimes fail with an error like:
      s3(xxx): sending <pathname>: sent XXX of YYY bytes
  where XXX was much greater than YYY.

- native FreeBSD/PCBSD build added to beta site

- removed timeout code from all destinations; it caused some problems,
  especially with FTP, and wasn't very useful since each destination
  runs in its own thread

0.9.8 - October 24, 2009

- repeat ad infinitum: test more before release, test more before release, ...

- the rsync timout was set too low, causing the next file transfer to
  start, concurrently, every 15 seconds

- fixed a selftest bug: OperationalError: no such column: blockshas.sha
  The database was fine - this was a bug in the selftest code

- the recover command didn't work with an rsync destination

- fixed KeyError problem on PCBSD when sizing memory

- fixed "Unable to read flags" error on PCBSD, in Linux compatibility
  mode; file flags (man chflags) are not yet supported on BSD / Mac

- ACL's are not yet supported on BSD / Mac

- if FUSE wasn't installed, mount would throw an exception

0.9.7 - October 22, 2009

- mount: on OSX, the umount command is used to unmount FUSE filesystems

- added Intel binary for Mac; 0.9.6 was compiled only for PowerPC and
  ran in emulation mode on Intel

- new destination type: rsync; see dest.conf.examples

- Amazon S3 was stepped on in 0.9.6, but is fixed

- database was being prepared for transmission even if it was deferred
  on all destinations; changed to avoid unnecessary work

- the backup program is copied to the backup directory if necessary

- imap (email) and S3 connection handling is improved

0.9.6 - October 16, 2009

- IMPORTANT NOTE: this version has a database change; use hb clear to
  remove beta test backups created with earlier versions, or create a
  new backup directory to use with this release.  The database format
  will be forward compatible at release 1.0

- improved scalability for backups >100GB

- backup: saving VM images (.vmdk, .hdd, .qcow2, etc) will use more
  disk space for the initial backup, but incremental backups will be
  much smaller for typical work loads

- space: document this 0.9.3 command on the beta site

- space: performance improved 5x

- get: regulate memory usage for large restores with millions of hard
  links (this change was for a 500GB restore with 31M files, more than
  half of which were hard links)

- ftp: changed block write timeout from 30 seconds to 2 minutes

- ftp: display a message when a timeout occurs

- dir destination expands ~jim in directory name

- a default inex.conf file (include/exclude) tailored for each
  computer system is created on the first backup.  Create an empty
  inex.conf or edit the file if you don't want the default exclusions

- initial backups directly to NFS are ~15% faster, but still slower
  than backing up to a local drive.  Incremental backups with few
  modified files are fast, comparable to backup on local drives

- added check for unrecognized arguments to commands

- fix typo in mount message: fusermount to unmount, not fuseumount

- added destname to recover command's help display

0.9.5 - September 14, 2009

- backup: if control-c was pressed at exactly the right time on the
  first backup, the database could be only partially initialized

- selftest: error counter was not always incremented, so error
  messages could sometimes be displayed but not counted

- selftest: detect missing root pathname record in hb.db

0.9.4 - September 7, 2009

- recover: if hb.db exists in the target directory but is not
  readable, for example, it's empty, recover would say "run a
  backup first"

- recover: a change in 0.9.2 caused the recover command to fail
  with a "transactions cannot be nested" message

- directory destinations: when copying to a directory, .tmp files
  would be left if the target disk runs out of space

0.9.3 - August 25, 2009

- clear: remove journals too

- backup: uses half as much memory to track hard links

- backup: backup huge directories (15M files) in ~50MB of memory

- backup: incremental backup huge directory on 1GB test machine, 15M
  empty files with 32000 hard links in 45 mins vs 105 mins

- new command "space" to show how backup space is being used

0.9.2 - August 21, 2009

- backup: reincarnated memory savings from version 0.3 for incremental
  backups to improve scalability on huge directories (>1M files)

- backup: remove warning "Unable to stat file" on deleted files

- backup: the built-in excluded path list (/proc, /tmp, etc) was
  removed because it caused confusion with /tmp backups, and /proc
  and /sys were excluded anyway as separate filesystems

- backup: hitting control-C at just the right time after starting a
  backup could cause a database to be half-initialized

- backup: hitting control-C during a backup could cause selftest to
  display a warning about high reference counts

- retain: if backup is run with -n (no transmit), then retain must
  transmit the database even if retain didn't remove any files

- backup: if a file had extended attributes with names containing
  characters >= 0x80, no extended attributes were saved for that file

- mount: fix Bad address error when trying to read ACL's on the fuse
  root or next-level directories

- mount: extended attributes fix

- mount: an error message was incorrectly sent to stdout vs stderr

0.9.1 - August 14, 2009

- initial FreeBSD compatibility testing.  This is not a FreeBSD native
  build yet and still relies on the Linux compatibility layer.  There
  has only been very light testing, but backup, versions, ls, and get
  appear to work fine with only 1 minor change so far

- don't try to read file system flags on FreeBSD systems

- S3 is not working yet on FreeBSD, but all other destinations have
  been tested and seem to work (FTP, ssh, IMAP, Gmail, directories)

- clear: added -f option to force clear; used by test programs

0.9 - August 9, 2009
NOTE: the 0.9.x releases will be for important fixes, in preparation
      for the 1.0 release, and instead of expiring in 1 month, these
      beta releases will expire in 3 months (only the backup command
      expires; backup data is still accessible anytime)

- database is 7-10% smaller.  Run hb clear first to remove existing
  beta test backups.  The database format will be stable and forward
  compatible at version 1.0

- mount: improved performance of file open and close when the backup
  is mounted as a filesystem

- get: if file1 and file2 were hard linked, restoring file2 w/o file1
  and not using --orig would cause a checksum mismatch and an empty
  file was restored.  Now, file2 is restored correctly, but will not
  be hard-linked to the existing file1; to cause the restored files to
  be hard-linked either use --orig or restore both files together

- get: in some circumstances, a file that was hard-linked would be
  restored without being hard-linked

- get: restoring a symbolic link that was also hard-linked could raise
  an exception

- get: restoring a hard-linked file with extended attributes could
  cause an exception

- mount: supports extended attributes (SELinux, ACL's)

- backup: if HASHBACKUP_DIR environment variable was set but was blank
  or empty, backup files were written to the current directory.  Set
  the environment variable to . to get this behavior

- if /var/hashbackup exists but the current user doesn't have write
  access, hb would stop with an insufficient access message.  Now hb
  will ignore this directory if the current user doesn't have access,
  and offer to create ~/hashbackup as usual

- backup: display a message after waiting 5 seconds for destination
  copies to finish, add a stat line for wait time

- add write lock to ensure single user write access to backup data
  for backup, clear, recover, retain, rm, and selftest commands

- versions: if a backup was interrupted, the versions command would
  sometimes print the current time as the backup's ending time

- retain: now refuses to run if the previous backup didn't finish

- retain: add -f/--force option to override previous safety feature

- retain: an error in the -x option, for example, -x30p, printed
  an error message (correct), but retain would run anyway (incorrect)

- retain: time-based retention (-t option) was based on the current
  time, ie, -t7d meant "in the last 7 days"; now it is relative to
  the last backup's finish time.  This prevents the problem where
  backups haven't been run for a while, then a retain is run and
  removes all but the most recent backup of current files because
  the backup is very old

0.8 - July 31, 2009

- removed internal nice -19, because a backup that took 10 secs
  on an idle machine took 790 seconds when one CPU-bound program
  was running; let users do nice -19 or ionice -c3 if needed

- mount: reading large files was very slow in ver 0.7

- backup: add -n option to defer copying the main database to
  destinations.  If retain is going to be run immediately after the
  backup, retain will upload the database

- add -v (verbose) option to backup; default level is -v2. -v3 will
  print paths either excluded or with the "no dump" attribute set,
  v1 prints no filenames, v0 prints no statistics

- add -v (verbose) option to get; default level is -v2

- get: ask whether to remove the partially restored file when a
  control-c is pressed

- get: display a warning that the current file was only partially
  restored if an error occurs during restore

- for a commandline like hb ugh /home, "unknown command: ugh" was
  displayed (correct) followed by "Unrecognized command: /home"

- backup: if unable to stat a file, print full pathname

- backup: better handling of hard-linked files that change during the
  backup.  Because of this change, databases created before 0.8
  may fail the selftest; use hb clear to remove beta test backups

- backup: bypass exclusion checks when saving parent directories
  of a requested file

- backup: exit code was the number of errors, but should have been
  either 0 for no errors or 1 if there were errors

- versions: align columns for neater output when userid varies

- backup: add number of files excluded to statistics

- get: trap exceptions and if --orig wasn't used (ie, we're restoring
  to a temp file), keep going.  If restoring with --orig, ask before
  continuing the restore

- get: if there were any errors during the restore, ask before
  replacing the original file or directory

- ls: display root path too

- get: print full pathnames instead of filenames

- recover: after recovering the database from a remote site, it was
  functional but was larger than the original

- open: the command to set the archive size limit isn't available yet
  (the limit is set to 1GB).  Backups to IMAP servers may need to
  lower this limit, and huge backups may want a higher limit

- ls: deleted files in earlier versions were not being displayed, even
  with -a

0.7 - July 25, 2009

- NOTE: database schema has changed - run hb clear if you have
  test backups from previous beta versions.  The database schema
  will be forward compatible beginning with the 1.0 release

- database uses up to 25% less space for very large files

- database uses much less space for virtual machine disk images

- get: verify file SHA hash matches after a restore

- mount: fixed error message when mount directory doesn't exist

- mount: fixed Bad address error when accessing non-current backups

- backup: after the first backup finishes, display a large notice
  about copying the key.conf and dest.conf files to safe locations

- changed some common error messages to prevent traceback displays

- backup: added the line number to exclude/include error messages

- backup/ls: fifos and devices were listed as partially backed up

- backup: a development assertion failed when hard-linked files had
  the "nodump" chattr attribute set or were not readable because of
  permission restrictions

- get: sparse files ending with zeros were not restored correctly

- get: verify with OS that a restored file is the correct size

- get: verify sparse file hash with a separate read pass after restore

- review and start to standardize error message displays

- selftest: didn't correctly handle a symbolic link that was also a
  hard link (yep, you can actually do this with Linux/ext3)

- mount: generated an error when accessing a symbolic link that was
  also a hard link

- mount: could return all zeroes when reading a hard linked file

- mount: now returns EIO if there is a problem reading a file

0.6 - July 18, 2009

- simplified retain -t and -x to only accept 1 time option, not NyNm

- new Freq keyword for destinations, like retain -t; defers copy
  until enough time has passed since the last copy to this destination

- bug fix in ssh destination when target directory did not exist

- ftp copy could leave a file open if an error occurred

- dest.db was being sent even if a destination was deferred

- dest.db is encrypted before sending offsite (other files already are)

- rm displayed "Removing all files from version x" when -r was used,
  but should have displayed "Removing requested files..." if paths
  were also listed

- rm and retain may defer archive uploads to save bandwidth

- rm was sometimes leaving a few blocks that should have been removed

- added block consistency tests to selftest

- get would stop on files with attributes that require root privilege
  to restore, for example, journal mode (j).  An error is now displayed

- get from a specific version (-r) would fail on directories

- get restores all directory attributes if parent directories have to
  be created with --orig (Ex: get --orig /a/b/c but only /a exists)

- get --orig failed with "Not a directory" when restoring /a/b/c, /a/b
  already exists, but b is a file and not a directory.  This still
  fails (it has to fail), but with a better explanation

- get will download an archive file from a destination if it is missing

- help command added

- get: if a and b were hard linked, only one was restored with --orig,
  and the other already existed, they weren't linked after the restore

- backup: added /tmp to the platform excluded directories

- get: clearer error message for pathnames ending with slash

- get: fixed existing file mtime check when multiple files restored

- get: instead of a warning, refuse to restore file over existing
  directory, or directory over existing file

- get: instead of a warning, refuse to restore a partially backed up
  file/directory over an existing file/directory, unless the existing
  directory is empty

- get: instead of a warning, refuse to restore / into a non-empty

- get: for safety, removed -f (force) option

- prevent tracebacks when expected error messages are displayed

- renamed to HashBackup

- added -a/-all option to mount, to allow all users access to the
  mounted backup filesystem.  Standard Unix permission checks are
  still performed on all accesses within the backup filesystem
  NOTE: by default, -a is only allowed by root, but it can be
  enabled for others with a /etc/fuse.conf setting

- backup: removed /mnt from internal exclusion list; /mnt is still
  skipped if a filesystem is mounted, unless it is listed on the
  backup command line

- backup: the backup directory contents were automatically excluded,
  but the directory itself was not.  This could cause the version to
  increment with one file changed, even if nothing else changed

- recover: if only 1 destination is setup, use it if none is specified

0.5 - July 7, 2009

- **NOTE: remove dest.db before using this rev

- backup/rm/retain: database transmit is 3-4x faster

- mount: new -f/--full option to show full backups in each version

- new destination type: ssh (see dest.conf.example)

- mount: if the backup directory was mounted, the backup, rm,
  and retain commands would fail with "Database is locked".

- mount: accessing a file that didn't exist caused a "Bad address"
  error instead of the correct "No such file or directory"

- rm/ls: wrong version displayed for the first file of a backup

- rm: didn't copy database to remotes if it didn't need to be compressed

- rm: don't display "Remove logid ..." - it's slow on very large removes

- selftest: display file counts instead of log id's - confusing

- all: prevent stack traceback when piping output into head

- backup: revert backup/memory reduction in 0.3 because of a database
  limitation: backup would fail after 5 minutes

- backup: remove empty archive if backup is interrupted

- backup: backups larger than 1GB fixed - nextarc typo

- recover: -f option fixed

- recover: no longer prompts for confirmation if no action would be taken

- backup: removed features to simplify code testing:
  removed -n option (dry run) from backup
  removed raw device backup
  removed --log option for separate log files (capture stderr instead)

- backup now prints a message when it skips a directory because the
  "no dump" attribute is set

0.4 - June 28, 2009

- mount command is available to view backups as a filesystem
  (requires Linux fuse kernel module, fusermount,

- s3 dest.conf accepts a new Location EU keyword to create European buckets

- selftest: verifying 11M files required 650MB of virtual memory,
  but now ~325M files can be verified in 650MB

- -c option failed - code typo

- using environment variable PALBACKUP_DIR failed - code typo

- readme: permissions should be 0700 on backup directory, not 0600

- readme: removed --force-full documentation; the option is
  still there, but it's mostly just confusing to new users

- readme: ls -a never required selection strings

- readme: retain Nn time means minutes, not seconds

0.3 - June 26, 2009

- backup: dumb error - extra comma caused immediate failure

- backup: decreased memory requirements of incremental backup for 1M
  file directory from 271MB to 110MB for better scalability

- ls: -r option was not showing any results

0.2 - June 26, 2009

- explain GET and RECOVER commands in README

- backup: fixed immediate abort with -n

- backup: increased speed of version 1+ backups 13x
  for large directories

- retain: --dry-run was incorrectly forced on

- selftest: fixed size display for zero length files

0.1 - June 25, 2009

- first beta release