Amazon S3
Amazon S3: https://aws.amazon.com/s3
Google Storage: https://cloud.google.com/storage
Minio: https://www.minio.io
Tebi: https://tebi.io
Wasabi: https://wasabi.com
HashBackup supports Amazon’s S3 object storage service for offsite backup storage, including S3-compatible services such as Google Storage, and Wasabi. This is the reference page for Amazon’s S3 service. Compatible services may have a separate page explaining special considerations, or see below for compatible configurations.
Config File
HashBackup uses Boto version 2 to connect to S3 services. There are many available config settings with Boto, and several different options for finding the config file. For example, you the environment variable BOTO_CONFIG can be set to the config file, or if not set, HashBackup will set it to ~/boto.cfg
and look for the Boto settings there.
Selective Download
Selective download is supported on S3 and compatibles, allowing HB to download only the parts of files that are needed for an operation, saving time, bandwidth, and download costs.
Amazon Lifecycle Policies & Glacier Transitioning
Amazon S3 has lifecycle policies that allow transitioning S3 files to
Glacier and automatically deleting files after a certain time. These
should not be used with HashBackup since HB cannot access files
transitioned to Glacier, and file retention and deletion is managed by
HB. As an alternative to Glacier transitioning, use S3’s Infrequent
Access storage class to reduce expenses (see class
keyword below).
File Transfer Verification
Uploads: HB generates a hash, the file is uploaded along with the hash, then S3 generates a hash of the file it receives and verifies that it matches the hash HB sent. HB may use multipart upload, where all workers cooperate to send a single large file. Upload hash verification occurs for regular and multipart uploads.
Downloads: if a file was sent without multipart upload, HB verifies that the MD5 hash of a downloaded file is the same MD5 that was sent. Multipart uploads are not verified on download (but keep reading).
HB often requests partial file downloads to save download costs. The S3 file hash cannot be verified with partial file downloads, but HB always verifies the SHA1 of every backup block before using it, and verifies the SHA1 of every file restored.
System Clock
It is important that your system clock is set accurately for S3 because a timestamp is sent with every request as part of the S3 protocol. If your system clock is too far off it will cause 403 Access Forbidden errors from any S3-like destination.
Free Tier Egress
Amazon S3 offers a 100GB per month free egress (download) allowance
that can be put to good use by HashBackup to verify your backup data
with incremental -inc -v4
selftest and to keep your backup data
compacted with the pack-download-limit
config option. Adjust the
limits on these two features to stay under the 100GB download limit
per month.
S3’s dest.conf
Keywords
type
(required)
s3
for Amazon S3
gs
for Google Cloud Storage’s S3 interface
s3
for other S3-compatible services; requires host
and port
keywords
host
This optional keyword is used with S3-compatibles to specify the region and host name, sometimes called the endpoint. It is not used for Amazon’s S3 service but is required for compatibles. For example:
host s3.wasabisys.com
port
This optional keyword is used to specify the port the S3 service is using. The default is port 80 for regular http connctions or port 443 for secure connections.
secure
This optional true / false keyword enables SSL. It also enables SSL if used without a value. The default is false because even over regular http the S3 protocol is resistant to attacks:
-
each S3 request is signed with your secret key and a timestamp
-
all user data sent by HB is encrypted
-
the only real attack vector is a replay attack
-
replay attacks can only happen 5-15 mins after the original request
-
HB rarely reuses filenames & filenames are part of the signature
If your bucket name contains a dot (not recommended), SSL will not
work because of an AWS restriction on bucket names. You may be able
to use subdomain false
to temporarily get around this limitation,
but AWS is deprecating URL-based bucket addressing.
Many S3-compatibles only work with SSL enabled so the secure
keyword is required.
|
subdomain
This optional true / false keyword is useful for S3-compatibles that
do not support bucket addressing as part of the host name. Using
subdomain false
makes self-hosted Minio installations easier to
manage because adding new buckets does not require DNS changes. The
default is true.
AWS is deprecating path-based addressing (subdomain false). |
accesskey
and secretkey
Your access and secret keys can be specified in the dest.conf
file
with the accesskey
and secretkey
keywords. These can be used with
any S3-compatible destination and take priority over environment
variables (below).
SECURITY NOTE: your access key is not a secret, does not have to be protected, and is sent in the headers with every request. It is like a username or account id. But, YOUR SECRET KEY SHOULD BE PROTECTED AND NEVER DISCLOSED!
Your secret key is not sent with your request. It is the password to your S3 account and is used to sign requests. Files containing your secret key should be protected.
If the accesskey
and/or secretkey
keywords are not in dest.conf
,
environment variables are checked. These environment variables have
different names for each provider, allowing you to have both Amazon
and Google Storage accounts configured with environment variables.
The environment variable names are:
Amazon S3 |
|
|
Google Storage |
|
|
The environment variables are set and exported in your login script.
For example, in .bashrc
in your home directory:
export AWS_ACCESS_KEY_ID=myverylongaccesskey
export AWS_SECRET_ACCESS_KEY=myverylongsecretkey
For Google Storage, you must generate Developer Keys. See: https://developers.google.com/storage/docs/migrating#migration-simple |
bucket
(required)
S3 destinations requires a bucket name. The bucket will be created if
it doesn’t exist. If the location
keyword is not used the bucket is
created in the default region, us-east-1.
Bucket names are globally unique, so names like "backup" are probably taken. Add a company name, host name, random number, or random text as a prefix or suffix, perhaps with a dash, to make your bucket name unique.
For S3-compatible services you may need to create the bucket before using HashBackup, especially to customize bucket settings like storage class.
Bucket names must be 3-63 characters, must start and end with a letter or digit, and can contain only letters (case insensitive), digits, and dashes. More bucket name rules at: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
class
S3 and compatible services often have multiple storage classes, with different classes having different cost structures, features, and/or restrictions. Some services such as Google Storage set the storage class at the bucket level, while others such as Amazon S3 set the storage class at the object (file) level. For services with bucket-level storage classes, use their website to set the storage class. The class keyword is used for Amazon S3 to set the storage class of uploaded files. The value can be:
-
standard
sets the standard S3 storage class (default ifclass
isn’t used) -
ia
sets the Infrequent Access storage class -
anything else is passed directly to S3 after uppercasing
HashBackup may store individual files in the standard class if it will be cheaper. For example, small files are cheaper to store in standard storage because all others have a 128K minimum file size. Files that might be deleted soon, such as hb.db.N files, are stored in standard storage to avoid the early delete penalty, though a file’s lifetime is usually hard to predict.
dir
This keyword allows many backups to be stored in the same bucket by
prepending the keyword value to the backup filename. Without the
dir
keyword, backup will create arc.0.0 in the top level of the
bucket. With the dir
keyword and value abc/xyz
, backup will
create abc/xyz/arc.0.0
in the bucket.
If you have an existing S3 backup and want to start using
dir , you will have to use an S3 utility to move the backup files
already stored. The easiest way is to use S3 copy requests to create
the new objects, then delete the old objects, then add the dir
keyword.
|
location
(recommended)
Specifies the Amazon region where a bucket is created or located. If omitted, US is used (us-east-1). Possible values are:
US = same as us-east-1
EU = same as eu-west-1
any other valid S3 region
Region names are on Amazon’s S3 site: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
Buckets live in a specific region. It’s important that a bucket’s correct region be specified or all requests are sent to us-east-1 and then redirected to the proper region. Sometimes these redirects fail and cause connection reset errors.
partsize
This keyword specifies a fixed partsize for multipart S3 uploads. The
default is 0, meaning HB chooses a reasonable part size from 5MB (the
smallest allowed) to 5GB (the largest allowed), based on the file
size. When the partsize
keyword is used, HB uses this part size to
determine the number of parts needed, then "levels" the part size
across all parts. For example, if uploading a 990MB file with a
partsize of 100M, HB will use 10 parts of 99M each. The size can be
specified as an integer number of bytes or with a suffix, like 100M or
100MB. The suffix is interpreted as a power of 2, so M means MiB, ie,
100M = 100 * 1024 * 1024.
multipart
This true/false keyword controls whether HB uses multipart uploads and downloads. The default is true. For Google Storage, the default is false because their S3-compatible API does not support multipart uploads.
When multipart upload is enabled, large arc files > 5GB are supported.
If multipart upload is disabled or not available, the S3 file size
limit is 5GB, so the config option arc-size-limit
should not be set
larger than 4GB.
debug
Controls debugging level. When set to 1 or higher, extra debugging
messages are either displayed or sent to a log file <destname>.log
in the backup directory.
timeout
(not supported)
The timeout for S3 connections is 5 minutes and cannot be changed.
rate
Specifies the maximum upload bandwidth per worker. See Destination Setup for details.
Example S3 dest.conf
destname myS3
type s3
location US
accesskey myaccesskey
secretkey mysecretkey
bucket myaccesskey-hashbackup
dir myhost1
class ia
Amazon Infrequent Access has a 30-day delete penalty on all
objects, meaning that if you upload 1GB and then delete it the next
day, you still pay for 30 days of storage. To minimize download
costs, the default for config option pack-age-days is 30. This
avoids deleting arc files until they are at least 30 days old, with
the idea that since you are paying for the space anyway, 30-day-old
arc files will have less active data to download for repacking.
HashBackup uses the S3 STANDARD storage class for small files and
files that will likely be deleted soon, such as hb.db.N files,
because it is cheaper than IA storage when delete penalties and
minimum file sizes are considered.
|
Example Minio dest.conf
destname minio
type s3
host play.minio.com
port 9000
multipart false
subdomain false
accesskey xxx
secretkey xxx
bucket xxx
location US
Example Tebi dest.conf - 25GB free!
destname tebi
type s3
host s3.tebi.io
accesskey xxx
secretkey xxx
bucket xxx
Example Wasabi dest.conf
destname wasabi
type s3
host s3.wasabisys.com
accesskey xxx
secretkey xxx
bucket xxx
Wasabi has a 90-day delete penalty on all objects, meaning
that if you upload 1GB and then delete it the next day, you still pay
for 3 months of storage. Unlike Amazon S3, Wasabi does not have a
standard storage class without delete penalties for files such as
hb.db.N that are not normally kept for 90 days, so you will have extra
fees for them.
|