Technical‎ > ‎


                        HashBackup Security Details
                            Jun 4, 2020

2012-05-24: misc typos
2012-09-05: edits before release
2012-11-08: removed public CRC from arc files
2012-12-13: use /dev/random + /dev/urandom for keys
            because of long hangs on Linux VMs
2013-10-24: Since Jan 2013, dest.conf can be stored in hb.db
2014-09-30: encrypted dest.db is copied directly to remotes
2016-01-05: whole file hash changed to SHA1
2016-03-10: updated text to match previously listed changes
2017-04-14: periodic review, minor changes for clarity
2020-06-04: periodic review, minor edits for clarity

HashBackup (HB) is a Unix backup program with these design goals:

- easy to use
- accurate backups and restores
- space-efficient storage of backup data
- bandwidth-efficient network transfers
- local and/or remote backup storage
- local backup storage for fast restores
- remote backup storage for disaster recovery
- remote storage using common protocols: ftp, ssh, rsync, S3, imap, etc.
- backup data sent directly to user's remote storage
- ability to use "dumb"/passive remote storage
- use of untrusted remote storage with client-side encryption and private keys
- authentication of restored data

This note outlines in detail the security procedures in HB, with the
goal of others being able to do a security assessment.


HB reads a file system, or selected files, and saves file data, file
metadata, and backup metadata to a local backup directory.  Data is
stored primarily in two places: an encrypted database named hb.db
stores metadata, and archive files store encrypted file data.

During backups, HB breaks files into blocks, suppresses redundant
blocks, compresses and encrypts blocks, and stores blocks in arc
files.  Archive files are named arc.v.n, where v is the backup number
(version) and n is a sequence number within the backup.  Archive files
are 100MB by default, but this is configurable.  So a backup of 500MB
will create arc.0.0 - arc.0.4, each ~100MB.  The next backup creates
arc.1.0, etc.  If remote destinations are configured in dest.conf (a
text file), arc files are sent offsite to all remotes during the
backup.  Local arc files may be deleted during the backup if
cache-size-limit is set, the cache becomes full, and the arc files
have been sent to all destinations.

File and backup metadata is stored in the hb.db database.  File
metadata includes Unix permission bits, ACLs, file dates and so on.
Backup metadata includes cryptographic hashes for blocks, which blocks
belong to which files, and which archive file contains a block.

hb.db accumulates historical information for all backups: it knows
which blocks are needed to reconstruct version 0 of a file, and which
blocks are needed to reconstruct the most recent version.  After a
backup, the local backup directory has an updated hb.db database and
incremental update files are sent to all remote destinations with changes
to hb.db.

There are other utilities to remove files from the backup, list files,
perform retention, mount a backup as a Unix filesystem, etc.


The security goal of HashBackup is secure remote storage of backup
data to untrusted storage providers.  Securing local backup data is
not a primary goal: if someone has access to a computer, there are
many avenues to monitor and attack it, from logging keystrokes to
intercepting shared library calls, installing MITM device drivers, and
so on.

Some data in HB's local backup directory stays local, for example, the
dedup table and the main key in key.conf.  Other data may be
sent offsite, with the goal of being able to reconstruct the backup
in a disaster situation: the computer is stolen, fire, flood, or the
hard disk containing the local backup crashes.

The main security goals then are to:

- send encrypted backup data to remote storage
- ensure the original data is not accessible via remote storage without the key
- retrieve the backup data
- ensure the data was not modified while on remote storage
- ensure that restored files are identical to the original

Different sites will have different security needs.  For example,
"remote" storage may be a company-owned FTP server connected over a
LAN.  Insecure protocols like FTP are supported because the
convenience they provide may be more important to a particular
customer than the protocol's security issues.  HashBackup supports
secure transport methods, but also supports insecure but convenient
transport methods.


There are several types of data sent to remote storage:

1. hb.db, an encrypted database, must be sent since it contains all
metadata.  It is potentially a large file.  To avoid sending the whole
file after every backup, hb.db is sent offsite as incrementals.  The
recover command downloads these incrementals to regenerate the
original hb.db file if the local backup directory is lost, ie, a
disaster recovery situation.

2. arc files (encrypted, compressed file data blocks) are sent offsite

3. an encrypted database, dest.db, maintains lists of HB's files that
are stored on each destination.  Its primary purpose is to avoid
needing to "list" the files on each remote, because sometimes this is
difficult or expensive.  When a file is sent to or deleted from a
remote, an entry is made in dest.db.  dest.db is sent to every remote
after each backup.  If the local backup directory is lost, dest.db is
fetched first, to tell HashBackup which other files need to be fetched
and where they are located (which remote has them).


Two files are never sent offsite, but contain important information
that is needed to recover the backup directory from a remote site:

key.conf contains the main key.  hb init creates this key, by default
using the system random number generators (/dev/random & /dev/urandom)
to get a 256-bit key.  This is then used to derive the rest of the
keys.  The key.conf file is never sent offsite.  The key may be
protected with a passphrase.  The permissions are set to read-only
for the user running init when the key file is created.

dest.conf is a text file containing remote configuration information.
For example, connecting to a remote FTP server requires a host name,
port number, user id, and password.  Each remote destination will have
a section in dest.conf with parameters needed to connect to the
remote.  dest.conf is not encrypted, so remote account passwords and
access tokens are in the clear.  It is not possible to use hash codes
for passwords in dest.conf, because passwords must be supplied to
remote services for validation.  Users are encouraged to put very
restrictive system access permissions on dest.conf.

For more security, the "hb dest load" command can be used to import
the dest.conf file into the encrypted hb.db file, and then the dest.conf
file is deleted.  For best security, hb.db should also be protected
with an admin passphrase and/or main key passphrase to prevent a local
user from recovering remote credentials.

Copies of the key.conf and dest.conf text files must be stored
separately and/or printed, to be used for disaster recovery (loss of
the local backup directory).


HB uses random numbers to generate:
- keys
- AES initialization vectors (IV)
- AES-CBC padding

To generate keys, HB first tries to read as much of the key as
possible from /dev/random, with a non-blocking read.  If this read is
unable to return a complete key because the entropy pool is depleted,
the rest of the key is read from /dev/urandom.  While /dev/random is
preferred for key generation, it often blocks for very long times on
Linux VMs, especially right after startup, to the point of being
unusable.  On BSD machines, /dev/random and /dev/urandom are usually
equivalent and never block.

The other use of random data is for AES IVs and CBC padding.  Up to 32
bytes may be required for each backup data block.  This can require
a significant amount of random data during backup operations, and
might deplete the system's entropy pools.  /dev/urandom is a
non-blocking version of /dev/random, but using it also might deplete
the system's entropy pools during backups, especially on Linux VMs.

To prevent system entropy pool depletion, HB uses AES-128 in OFB mode
to generate cryptographically secure random numbers.  The RNG is
seeded with 48 bytes of data from /dev/urandom:

- 16 bytes for the AES-128 key
- 16 bytes for the AES-128 IV
- 16 bytes for the plaintext data to encrypt


Analysis of Linux RNG:

Using AES as a CSRNG:
Empirical Evidence Concerning AES
Peter Hellekalek, Stefan Wegenkittl
See section 4, Findings, first 3 paragraphs (RNG mode)


The main sequence of events to use HB is:

1. create a backup directory: hb init -c backupdir; only once

2. make a series of backups: hb backup -c backupdir /home

3. do file retensions, remove files, list, etc (utilities) as needed

4. rekey if needed

5. restore files

6. recover the backup from a remote (disaster recovery)

INIT WALKTHROUGH: hb init -c backupdir

1. create backupdir if it doesn't exist
   if it does exist, raise an error if it is not empty
   set protection to 700 (owner RWD, group none, others none)
   Here's an example backupdir created by hb init:

   $ ls -ld hb
   drwx------  18 jim  staff  612 May 14 19:05 hb

2. create key.conf:
   the string in key.conf is not used directly as an encryption key;
   it is used to derive other keys (see later)
   not stored anywhere except in key.conf
   hashes of it are not stored
   never sent offsite
   protected 400 (owner Read, group none, others None)
   Here is a summary of key creation options.  It is explained in more
   detail later in its own section.
   a. the default key is a 256-bit random value read from the system
   random number generators (see above, key generation).  This is
   hex-encoded and written to key.conf as groups of 4 hex digits to
   make it easier to transcribe the key.  Here's an example key.conf:

   $ ls -l hb/key.conf
   -r--------  1 jim  staff  334 Feb 24 16:37 hb/key.conf

   $ cat hb/key.conf
   # HashBackup Key File - DO NOT EDIT!
   Version 1
   Build 1481
   Created Wed Feb 24 16:37:56 2016 1456349876.07
   Host Darwin | mb | 10.8.0 | Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 | i386
   Keyfrom random
   Key 8907 0c37 0d9a 8852 807c 1f26 b179 3b15 a994 c165 0313 5cbd c9a3 a500 5cd3 9b3a

   b. hb init also accepts a -k option.  This allows the user to set
   their own key.  It is potentially less secure, and the instructions
   and HB warn about this.  This option lets users set a key they will
   be able to remember without copying the key.conf file to safe
   places.  For example, hb init -c backupdir -k jim would write the
   key 'jim' to key.conf.

   c. to accommodate users' requests for "no encryption", -k '' creates
   a null key.  Backups are still encrypted, but there is no key to
   remember.  Users are warned that anyone with access to the
   encrypted backup data can recover the original, unencrypted data.

   d. a passphrase can be added to the key with the -p option.  -p ask
   (add "ask" after -p) means to read the passphrase from the keyboard
   on every HB command.  A passphrase secures backup data: 1) for users
   in hosted or managed environments, like a VPS; 2) when the backup
   directory is on USB thumb drives; 3) when the backup directory is on
   mounted storage like Google Drive, Amazon Cloud Drive, Dropbox,

   e. the -p env (add "env" after -p) option means HB will read the
   passphrase from the environment variable HBPASS.  This is more
   convenient than -k ask because the user can set the environment
   variable once in their login session.  But it is less secure
   because any program the user runs also can read the passphrase.
   Users can add a command to their .profile startup file to set the
   environment variable, but this is potentially less secure and not
   recommended unless the user's home directory is encrypted by the
   OS.  The protection on .profile should be set to 0400 (user read,
   group none, others none).

3. by using the data in key.conf and procedures described in KEY
   CREATION, a single 256-bit value is generated, called key.  Then:

   - an encryption key k1 is created by sha256(salt + key)

   - an authentication key k2 is created by sha256(salt + key)

   - the salts are random constants built into the HB executable.  The
     purpose of salting the key is to obfuscate the actual encryption
     key so that the local hb.db file cannot be accessed outside of
     HB, perhaps changing it accidentally and causing bizarre bugs.

   - hb.db is encrypted with an actual AES-128 key of k1[0:16)

   - k2 is used with HMAC-SHA1 to authenticate remote files

4. after generating the actual key, hb.db is created.  It is encrypted
   with AES-128 in OFB mode, with a unique IV for each database page.
   The IV is a 4-byte page number plus 12 random bytes from an
   RC4-based random number generator, seeded from /dev/urandom.
   hb.db is created with SQLite Encryption Edition, a paid version of
   SQLite created by the authors of SQLite.

5. an initial AES-128 backup key is generated using the system random
   number generators (see above, key generation) and stored in hb.db.
   This key will be used to encrypt 64GB of backup data (2^32 blocks
   of 16 bytes each), then a new backup key is added to the list.
   This is recommended for AES in CBC mode to prevent using the same
   key "too long".


hb init creates key.conf by combining the -k and -p options.  There
are 8 possible combinations, some being more secure than others.  The
less secure options also tend to be more convenient to the user.  They
are described here from most secure to least secure.  Plus signs are
used for "pros" while dash signs are used for "cons".

1. no -k, -p ask
   Store 'ask' + 256-bit random in key.conf
   Read passphrase from keyboard
   Stretch passphrase + random with pbkdf2

   - must backup key.conf
   - must remember passphrase
   - can't automate HB commands (like in a cron job)

   Security: 5
   + access to key.conf is insufficient
   + random key cannot be guessed
   + passphrase is not stored

2. no -k, no -p (default)
   Store 256-bit random key in key.conf
   No stretch needed

   - must backup key.conf
   + nothing to remember
   + can automate HB commands

   Security: 4
   - access to key.conf gives all access
   + random key cannot be guessed

3. no -k, -p env
   Store 'env' + 256-bit random in key.conf
   Read passphrase from environment variable
   Stretch passphrase + random with pbkdf2

   - must backup key.conf
   - must remember or store passphrase
   + can automate HB commands

   Security: 4
   - other programs have access to env vars
   - user might store key in .profile file
   + random key cannot be guessed

4. -k jim -p ask
   Store 'ask' + jim in key.conf
   Stretch passphrase with 'jim' as salt

   + easier to reconstruct key.conf manually
   - can't automate HB commands

   Security: 3
   - salt is not random
   + salt is not HB constant
   + password is not stored

5. -k '' -p ask
   Store only 'ask' in key.conf
   Stretch passphrase w/HB constant salt

   + doesn't need to backup key.conf
   - must remember passphrase
   + easy to reconstruct key.conf
   - can't automate HB commands

   Security: 2
   - constant salt is embedded in HB program
   - constant salt = possible table attack
   + password is not stored

6. -k '' -p env
   Store only 'env' in key.conf
   Stretch passphrase w/HB constant salt

   + doesn't need to store key.conf
   - must remember or store passphrase
   + easy to reconstruct key.conf
   + can automate HB commands

   Security: 1
   - other programs have access to env vars
   - might set env var in .profile file
   - constant salt is embedded in HB program
   - constant salt = possible table attack
   + there is a key!

7. -k jim
   Store 'jim' in key.conf
   Stretch key w/HB constant salt

   ? may need to store key.conf for complex keys
   ? may be easy to reconstruct key.conf for simple keys
   + can automate HB commands

   Security: 1
   - access to key.conf gives all access
   - constant salt is embedded in HB program
   - constant salt -> possible table attack
   + there is a key!

8. -k ''
   Store nothing in key.conf

   + doesn't need to store key.conf
   + nothing to remember
   + can automate HB commands

   Security: 0
   - no security if backup data is accessible

After the key.conf file is created with one of these combinations of
-k and -p, the actual encryption keys are generated.  The procedure

   - the key.conf file is read

   - for passphrase type 'ask', the user is prompted for a passphrase

   - for passphrase type 'env', the HBPASS shell variable is read

   - for ask and env, there may be a salt in the key.conf file.  If
     there is, it is used.  If not, a constant built into HB is used
     instead.  This constant was generated from /dev/random.

   - for passphrases (-p) and user-specified keys (-k jim), pbkdf2 is
     used to stretch the passphrase and/or key.  A 256-bit salt is
     used, with many thousand iterations.  For -k jim -p ask/env, the
     salt would only be 3 bytes (jim).

   - for random keys with no passphrase (no -k or -p option), pbkdf2
     is unnecessary and not used.

   - the result is a 256-bit 'master key' that all actual keys are
     derived from (see 3. in prior hb init section)

BACKUP WALKTHROUGH: hb backup -c backupdir /

The backup function reads through the filesystem, breaks files into
chunks, stores metadata in hb.db, stores file data in HB archive
files, and optionally sends files offsite.  Block creation is
described here; sending files offsite is described later.

1. Split the file to be backed up into blocks of either fixed or
   variable size, depending on the file type and options set.

2. For each block, compute SHA1(data).

3. Lookup the SHA1 hash in the dedup table.  If found, refcount this
   block without writing to the archive file.

   NOTE: the dedup table is a local file and never sent offsite.  But
   before blindly using results of this dedup lookup, the hb.db
   database is consulted to verify this block's SHA1 hash matches the
   value in the dedup table.  If not, the dedup lookup fails.  This
   prevents the situation where someone hacks the local dedup table,
   or the dedup table logic has a bug, which could cause backups to be
   scrambled.  This would be detected on restore and selftest, because
   *file* hashes would not match, but would not be detected at backup
   time without this extra check.

4. For a new SHA1 hash not seen before, data is written to the archive
   file.  The steps are:

   a. compress the data block
   b. get 256-bits (32 bytes) from the HB RNG
   c. use 128 bits (16 bytes) as the AES IV
   d. use the rest for CBC padding
   e. encrypt with AES-128-CBC
   f. write the IV and encrypted block to the archive file

5. record the new block metadata in hb.db


Archive files are sent offsite as is, without modification.  The
transport method (ftp, rsync, etc) is used to transfer the files and
an entry is logged in dest.db on success.  Details about the archive
file format are discussed later.  The main protection for archive data
is the SHA1 hash stored in hb.db for each block, and the SHA1 hash for
each file backed up, also stored in hb.db


It's much trickier to send hb.db offsite, because it is one file,
maybe big, that may grow as backups are added.  To avoid using more
bandwidth every backup, hb.db is sent incrementally to remotes (only
modified data is sent).  These increments are numbered, hb.db.0,
hb.db.1, etc.  By combining these increments, the original hb.db can
be constructed.

Each increment is compressed and encrypted with AES-128, and contains
several hashes / HMACs:

 1. public SHA1 digest of the entire increment file

 2. HMAC-SHA1 digest of this SHA1 digest, keyed with the k2
    authentication key

 3. HMAC-SHA1 digest of the original hb.db file, keyed with the k2
    authentication key.

The first public SHA1 digest allows anyone to verify the integrity of
the hb.db.n file.

Since the public SHA1 digest is over the entire hb.db.n file, it is
possible to change the file data and update this digest.  The private
HMAC-SHA1 digest makes this impossible without knowing the
authentication key.

It would still be possible for someone on the remote side to copy
hb.db.1 over hb.db.0.  This would verify with both the SHA1 and
HMAC-SHA1, so another HMAC-SHA1 is added - this one over the original
hb.db file.  In addition to catching the copy problem, it also may
catch software bugs, for example, if an increment is applied
improperly, in the wrong order, twice or perhaps not at all.

After the hb.db.n increment file is created, it is uploaded to the
remote just like archive files, using the configured transport method.


The dest.db file is a manifest of all files sent or waiting to be sent
to each remote.

The information in dest.db is not particularly sensitive.  It tells
which archive files and hb.db.n increments are on which remotes, the
size of these files, and the dates they were transferred.  It is
encrypted like the hb.db file, with AES-128, using the same key as

Like archive files, the dest.db file is copied directly to remote
destinations.  After the copy, more I/O may occur to the local
dest.db, so the local and remote dest.db may not always match exactly.


Restoring a file uses local backup data if available, or downloads
arc files from remote destinations if necessary.

During restore, hb.db is used to get a file's original metadata
(permissions, etc) and a list of blocks that make up the file data.

For each block in the file:

- read the block from an archive file
- the block is decrypted using the backup key stored in hb.db
- the block is unpadded if necessary
- it is decompressed if necessary
- a SHA1 is computed and compared with the block's original SHA1 in hb.db
- if different, an error occurs and the restore aborts for this file
- otherwise, the block is written to disk and the SHA1 file hash is updated
- restore continues with the remaining blocks

As a precaution against a padding oracle attack (which doesn't exactly
apply to HB because the key is present, but better to be cautious), no
error is raised if an error occurs during depadding or decompression.
Instead, the only error reported is "block hash mismatch".

After all blocks have been written to the restored file, the restored
file's SHA1 hash is compared with the original SHA1 hash created
during backup.  Because this is a file hash, it can detect errors such
as restore bugs, dedup problems, and so on.  For example, in the very
unlikely event there is a SHA1 block hash collision during dedup, this
file hash will detect it, although at that point nothing can be done
to fix it.


If the local backup directory is lost because the hard drive crashes,
the computer is stolen, a fire occurs, etc., the only copy of the
backup is on remote destinations.  To recover it requires:

- the key.conf file
- the dest.conf file (connection parameters for remotes)

If a -p 'ask' or 'env' passphrase was used, then the user needs to
know the passphrase too.

Recover connects to a destination chosen by the user and downloads
dest.db.  This contains a list of all files backed up at the destination.
The hb.db.n increments are downloaded, verified, and applied to
re-create the original hb.db database.  The archive files may or may
not be downloaded, depending on options used.  When the local backup
directory has been re-created, backups and restores can occur again.

Details for recovering each file follow.


The dest.db file is encrypted like hb.db, but is a relatively small
file so is sent and retrieved "as is", like arc files.

It would be possible to attack this on the remote by saving a dest.db
and re-installing it every day on the remote, ignoring uploads of
newer versions.  Or, someone administering the remote storage could
just delete all of the backup data.  These types of attacks are
possible whenever remote storage is used, and it is hard to
impossible to protect from this kind of remote manipulation.  One
solution is to send backup data to multiple remote sites, so that if
one misbehaves, others are still available.

Another option might be to display, log, or email some kind of HMAC
fingerprint for the dest.db and hb.db files after every backup.
Recover could display its calculated fingerprint and the user could
decide whether they match.  This could be implemented at any time, but
for now, it seems like overkill and doubtful that users would follow
this kind of procedure, so it would add little extra security.


This is basically the reverse process of generating an increment.
Before processing an hb.db.n increment file, the SHA1 hash and HMAC of
the file is verified.  The recovery aborts if there is an error in
either of these.

Data from the increment is decrypted, decompressed, and stored in

After all increments have been restored, an HMAC is taken over the
entire hb.db file.  This must match the 2nd HMAC recorded in the
last increment.  If it doesn't, an error is displayed and the recover

REKEY: generating a new key.conf key

HB has a rekey operation to generate a new key.conf key.  In its
default operation, hb rekey -c backupdir will generate a new 256-bit
random key, follow the procedure described above under 'init' for
generating keys, then re-encrypt the hb.db and dest.db files.  This
re-encrypt occurs within a database transaction, so it's an
all-or-nothing operation.  A commit makes the changes permanent, the
old key file is renamed key.conf.orig, and the new key.conf file is
installed.  Rekey has special handling for interrupts occurring any
time during the rekey operation to rollback the opration.

No archive data is modified on a rekey operation.  Archive blocks are
encrypted with backup keys, which are stored in hb.db, not key.conf.
These are not changed on a rekey operation.  The goal of rekey is that
anyone with an old copy of key.conf can no longer access the backup
after the rekey.


Archive files are a collection of encrypted blocks.

Before any data block is used from an archive file, a SHA1 hash is
computed from the decrypted, decompressed data, and then compared to
the SHA1 hash originally stored in hb.db for that block.  This acts
like an HMAC: an attacker changing archive data would also have to
update the SHA1 in hb.db, and to do that, would need the key.

An attacker with write access to an archive on a remote could also
just delete it, causing data loss.  This can be mitigated by copying
backups to multiple destinations.  They cannot cause incorrect data
to be restored because of the block and file hashes stored in hb.db.