Backblaze B2

Advantages:

  • Free to upload backup data

  • Low cost storage: 0.5 cents/GB/month

  • No minimum storage time

  • No minimum file size

  • First 10GB of storage is free every month

  • Low cost to retrieve data: 1 cent/GB

  • Free 1GB per day retrieval allowance

  • Supports selective downloads to lower retrieval costs

  • Backblaze will ship your data on a disk, free if you return the drive

  • Each file is spread across 20 servers, any 17 can reconstruct the original

  • Data is actively "scrubbed" by Backblaze while on B2

  • SHA1 hash verification on file transfers

File Transfer Verification

Uploads: HB generates a SHA1 hash for every file uploaded to B2. The B2 service verifies that the SHA1 sent by HB matches the SHA1 B2 generates for the data it received. This verifies that the file was received correctly by B2. If this check fails, B2 signals an error and HB retries the upload.

Downloads: B2 sends the file’s SHA1 with the downloaded file. HB computes the SHA1 of the file it receives and compares it to the SHA1 B2 sent to make sure there were no transmission errors. HB also compares this to the SHA1 sent when the file was uploaded. This verifies that the file received by HB is exactly what it uploaded. If any tests fail, an error is signaled and HB retries the download.

HB sometimes uses partial file downloads to save download costs. The B2 file SHA1 cannot be verified with partial file downloads, but HB always verifies the SHA1 of every backup block before using it, and verifies the SHA1 of every file restored.

Versioning

B2 versions files by default. That means if file1 is created in the bucket, then deleted, it is not really deleted. If the same file is uploaded again, there will be 2 copies. This behavior is not useful for HashBackup, so the first time a bucket is accessed, HB will set a lifecycle rule that means "only keep 1 version". If there is already a lifecycle rule, HB does not change it. HB also does an immediate delete of the extra versions B2 creates to save storage costs.

5GB File Size Limit

B2 has a file upload size limit of 5GB unless special upload procedures are used. HB does not use B2’s "large file upload" procedures, so the arc-size-limit config keyword should be set to 4GB or less. If you already have a backup with arc files > 5GB and want to migrate it to B2, use maxsize 5GB in dest.conf. This will cause arc file splitting, which is inefficient because it doubles arc file disk I/O, but the uploads will work.

B2 Storage Optimization

When retain and rm are used to remove files from the backup, they creates "holes" in the arc files. HashBackup can do a pack operation to remove these holes and optimize / minimize backup storage space and costs. Packing is controlled by several config options.

With Backblaze B2, download costs are only slightly higher than storage costs, so packing of remote arc files can be more aggressive. By default, remote archives with pack-percent-free bytes of free space are downloaded and repacked periodically (every 7 days) when rm and retain are run, for up to 950MB of downloaded data. For more aggressive packing that is still cost efficient, these config settings are recommended for B2:

  • pack-download-limit <N>GB where N is some reasonable value for your site. Even with a high value for N, this is cost efficient for B2 because download costs are only 2x storage costs.

  • pack-percent-free 60

For even more aggressive remote packing, use a smaller value for pack-age-days. 0 means to pack every time rm and retain are run.

B2’s dest.conf Keywords

type (required)

type b2

accountid

Your B2 Account Id. This keyword has priority over environment variables (more below about environment variables). Either accountid or keyid is required, and only one should be used.

appkey

When the accountid keyword is used, appkey should be your master application key. When keyid is used, appkey should be the corresponding application key. This keyword has priority over environment variables.

Environment variables are checked if accountid and/or appkey are not specified in dest.conf. The environment variable names are:

B2_ACCOUNT_ID
B2_APP_KEY

These environment variables are set and exported in your login script. For example, in .profile in your home directory:

export B2_ACCOUNT_ID=123456789012
export B2_APP_KEY=1234567890123456789012345678901234567890

If you add these to your .profile (or .bash_profile), you should protect the file so only you can access it with: chmod 600 .profile

bucket (required)

Each B2 destination requires a bucket keyword. B2 buckets have both a bucket name and a bucket id. HB tries the value as a bucket name first, then as a bucket id. HB will create a private B2 bucket if the bucket name you use doesn’t exist. Each B2 account can have many buckets. Bucket names are globally unique, so names like "backup" are probably taken and will need something added to make them unique.

Bucket name restrictions:

  • 6-50 characters

  • starts and ends with a letter or digit

  • contains only letters, digits, and dashes

B2 allows mixed case in bucket names but behaves as if all letters are the same case, so AbcDef and abcdef are the same bucket.

dir

This keyword allows many backups to be stored in the same bucket by prepending the value to the backup filename. Without the dir keyword, a backup will create arc.0.0 in the top level of the bucket. With the dir keyword and value abc/xyz, the first backup will create abc/xyz/arc.0.0 in the bucket. Leading slashes are stripped because B2 does not allow leading slashes.

if you have an existing B2 backup and want to start using dir, you will have to move the backup files already stored by hand. Then add the dir keyword to dest.conf.

keyid

Selects a specific B2 application key by its key id. Each B2 application key has a key id and a key value. These are specified in dest.conf with the keyid and appkey keywords.

rate

Limits upload bandwidth per worker. See Destination Setup for details.

workers

Backblaze B2 has somewhat higher latency than Amazon S3 or Google Storage. To compensate, you may want to increase the number of workers (default is 4) for higher performance. Add workers 2-4 at a time until there is no performance improvement. More than 20 workers probably doesn’t make sense. See Destination Setup for details.

debug

If set to 1 or higher, all traffic will be logged to <destname>.<timestamp>.log in the backup directory. Confidential data including authentication tokens and headers are replaced with xxx in debug logs. All debug values generate the same log data, though this may change in the future. Setting workers to 1 makes debugging easier, and using retries 0 can make failures happen quicker.

Example dest.conf for B2

destname b2
type b2
accountid 0123456789ab
appkey 0123456789abcdef0123456789abcdef0123456789
bucket hbbackup
dir myhost1