Amazon S3

HashBackup supports Amazon’s S3 object storage service for offsite backup storage, including S3-compatible services such as Google Storage, and Wasabi. This is the reference page for Amazon’s S3 service. Compatible services may have a separate page explaining special considerations, or see below for compatible configurations.

Config File

HashBackup uses Boto version 2 to connect to S3 services. There are many available config settings with Boto, and several different options for finding the config file. For example, you the environment variable BOTO_CONFIG can be set to the config file, or if not set, HashBackup will set it to ~/boto.cfg and look for the Boto settings there.

Selective Download

Selective download is supported on S3 and compatibles, allowing HB to download only the parts of files that are needed for an operation, saving time, bandwidth, and download costs.

Amazon Lifecycle Policies & Glacier Transitioning

Amazon S3 has lifecycle policies that allow transitioning S3 files to Glacier and automatically deleting files after a certain time. These should not be used with HashBackup since HB cannot access files transitioned to Glacier, and file retention and deletion is managed by HB. As an alternative to Glacier transitioning, use S3’s Infrequent Access storage class to reduce expenses (see class keyword below).

File Transfer Verification

Uploads: HB generates a hash, the file is uploaded along with the hash, then S3 generates a hash of the file it receives and verifies that it matches the hash HB sent. HB may use multipart upload, where all workers cooperate to send a single large file. Upload hash verification occurs for regular and multipart uploads.

Downloads: if a file was sent without multipart upload, HB verifies that the MD5 hash of a downloaded file is the same MD5 that was sent. Multipart uploads are not verified on download (but keep reading).

HB often requests partial file downloads to save download costs. The S3 file hash cannot be verified with partial file downloads, but HB always verifies the SHA1 of every backup block before using it, and verifies the SHA1 of every file restored.

System Clock

It is important that your system clock is set accurately for S3 because a timestamp is sent with every request as part of the S3 protocol. If your system clock is too far off it will cause 403 Access Forbidden errors from any S3-like destination.

Free Tier Egress

Amazon S3 offers a 100GB per month free egress (download) allowance that can be put to good use by HashBackup to verify your backup data with incremental -inc -v4 selftest and to keep your backup data compacted with the pack-download-limit config option. Adjust the limits on these two features to stay under the 100GB download limit per month.

S3’s dest.conf Keywords

type (required)

s3 for Amazon S3
gs for Google Cloud Storage’s S3 interface
s3 for other S3-compatible services; requires host and port keywords

host

This optional keyword is used with S3-compatibles to specify the region and host name, sometimes called the endpoint. It is not used for Amazon’s S3 service but is required for compatibles. For example:

host s3.wasabisys.com

port

This optional keyword is used to specify the port the S3 service is using. The default is port 80 for regular http connctions or port 443 for secure connections.

secure

This optional true / false keyword enables SSL. It also enables SSL if used without a value. The default is false because even over regular http the S3 protocol is resistant to attacks:

each S3 request is signed with your secret key and a timestamp
all user data sent by HB is encrypted
the only real attack vector is a replay attack
replay attacks can only happen 5-15 mins after the original request
HB rarely reuses filenames & filenames are part of the signature

If your bucket name contains a dot (not recommended), SSL will not work because of an AWS restriction on bucket names. You may be able to use subdomain false to temporarily get around this limitation, but AWS is deprecating URL-based bucket addressing.

Many S3-compatibles only work with SSL enabled so the secure keyword is required.

subdomain

This optional true / false keyword is useful for S3-compatibles that do not support bucket addressing as part of the host name. Using subdomain false makes self-hosted Minio installations easier to manage because adding new buckets does not require DNS changes. The default is true.

AWS is deprecating path-based addressing (subdomain false).

accesskey and secretkey

Your access and secret keys can be specified in the dest.conf file with the accesskey and secretkey keywords. These can be used with any S3-compatible destination and take priority over environment variables (below).

SECURITY NOTE: your access key is not a secret, does not have to be protected, and is sent in the headers with every request. It is like a username or account id. But, YOUR SECRET KEY SHOULD BE PROTECTED AND NEVER DISCLOSED!

Your secret key is not sent with your request. It is the password to your S3 account and is used to sign requests. Files containing your secret key should be protected.

If the accesskey and/or secretkey keywords are not in dest.conf, environment variables are checked. These environment variables have different names for each provider, allowing you to have both Amazon and Google Storage accounts configured with environment variables. The environment variable names are:

Amazon S3

AWS_ACCESS_KEY_ID

AWS_SECRET_ACCESS_KEY

Google Storage

GS_ACCESS_KEY_ID

GS_SECRET_ACCESS_KEY

The environment variables are set and exported in your login script. For example, in .bashrc in your home directory:

export AWS_ACCESS_KEY_ID=myverylongaccesskey
export AWS_SECRET_ACCESS_KEY=myverylongsecretkey

For Google Storage, you must generate Developer Keys. See: https://developers.google.com/storage/docs/migrating#migration-simple

bucket (required)

S3 destinations requires a bucket name. The bucket will be created if it doesn’t exist. If the location keyword is not used the bucket is created in the default region, us-east-1.

Bucket names are globally unique, so names like "backup" are probably taken. Add a company name, host name, random number, or random text as a prefix or suffix, perhaps with a dash, to make your bucket name unique.

For S3-compatible services you may need to create the bucket before using HashBackup, especially to customize bucket settings like storage class.

Bucket names must be 3-63 characters, must start and end with a letter or digit, and can contain only letters (case insensitive), digits, and dashes. More bucket name rules at: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html

class

S3 and compatible services often have multiple storage classes, with different classes having different cost structures, features, and/or restrictions. Some services such as Google Storage set the storage class at the bucket level, while others such as Amazon S3 set the storage class at the object (file) level. For services with bucket-level storage classes, use their website to set the storage class. The class keyword is used for Amazon S3 to set the storage class of uploaded files. The value can be:

standard sets the standard S3 storage class (default if class isn’t used)
ia sets the Infrequent Access storage class
anything else is passed directly to S3 after uppercasing

HashBackup may store individual files in the standard class if it will be cheaper. For example, small files are cheaper to store in standard storage because all others have a 128K minimum file size. Files that might be deleted soon, such as hb.db.N files, are stored in standard storage to avoid the early delete penalty, though a file’s lifetime is usually hard to predict.

dir

This keyword allows many backups to be stored in the same bucket by prepending the keyword value to the backup filename. Without the dir keyword, backup will create arc.0.0 in the top level of the bucket. With the dir keyword and value abc/xyz, backup will create abc/xyz/arc.0.0 in the bucket.

If you have an existing S3 backup and want to start using dir, you will have to use an S3 utility to move the backup files already stored. The easiest way is to use S3 copy requests to create the new objects, then delete the old objects, then add the dir keyword.

location (recommended)

Specifies the Amazon region where a bucket is created or located. If omitted, US is used (us-east-1). Possible values are:

US = same as us-east-1

EU = same as eu-west-1

any other valid S3 region

Region names are on Amazon’s S3 site: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

Buckets live in a specific region. It’s important that a bucket’s correct region be specified or all requests are sent to us-east-1 and then redirected to the proper region. Sometimes these redirects fail and cause connection reset errors.

partsize

This keyword specifies a fixed partsize for multipart S3 uploads. The default is 0, meaning HB chooses a reasonable part size from 5MB (the smallest allowed) to 5GB (the largest allowed), based on the file size. When the partsize keyword is used, HB uses this part size to determine the number of parts needed, then "levels" the part size across all parts. For example, if uploading a 990MB file with a partsize of 100M, HB will use 10 parts of 99M each. The size can be specified as an integer number of bytes or with a suffix, like 100M or 100MB. The suffix is interpreted as a power of 2, so M means MiB, ie, 100M = 100 * 1024 * 1024.

multipart

This true/false keyword controls whether HB uses multipart uploads and downloads. The default is true. For Google Storage, the default is false because their S3-compatible API does not support multipart uploads.

When multipart upload is enabled, large arc files > 5GB are supported. If multipart upload is disabled or not available, the S3 file size limit is 5GB, so the config option arc-size-limit should not be set larger than 4GB.

debug

Controls debugging level. When set to 1 or higher, extra debugging messages are either displayed or sent to a log file <destname>.log in the backup directory.

timeout (not supported)

The timeout for S3 connections is 5 minutes and cannot be changed.

rate

Specifies the maximum upload bandwidth per worker. See Destination Setup for details.

Example S3 dest.conf

destname myS3
type s3
location US
accesskey myaccesskey
secretkey mysecretkey
bucket myaccesskey-hashbackup
dir myhost1
class ia

Amazon Infrequent Access has a 30-day delete penalty on all objects, meaning that if you upload 1GB and then delete it the next day, you still pay for 30 days of storage. To minimize download costs, the default for config option pack-age-days is 30. This avoids deleting arc files until they are at least 30 days old, with the idea that since you are paying for the space anyway, 30-day-old arc files will have less active data to download for repacking. HashBackup uses the S3 STANDARD storage class for small files and files that will likely be deleted soon, such as hb.db.N files, because it is cheaper than IA storage when delete penalties and minimum file sizes are considered.

Example Minio dest.conf

destname minio
type s3
host play.minio.com
port 9000
multipart false
subdomain false
accesskey xxx
secretkey xxx
bucket xxx
location US

Example Tebi dest.conf - 25GB free!

destname tebi
type s3
host s3.tebi.io
accesskey xxx
secretkey xxx
bucket xxx

Example Wasabi dest.conf

destname wasabi
type s3
host s3.wasabisys.com
accesskey xxx
secretkey xxx
bucket xxx

Wasabi has a 90-day delete penalty on all objects, meaning that if you upload 1GB and then delete it the next day, you still pay for 3 months of storage. Unlike Amazon S3, Wasabi does not have a standard storage class without delete penalties for files such as hb.db.N that are not normally kept for 90 days, so you will have extra fees for them.