Technical‎ > ‎

Glacier EOL

HashBackup's support for Amazon Glacier ends September 2016

September
2016, Amazon Glacier support will be removed from
HashBackup.  Details about how to migrate your backups off Glacier and
onto other storage services are further down.

NOTE: all of the information below applies both to Glacier
destinations and S3 destinations with Glacier transitioning enabled.

The reasons for removing Glacier support are:

* When Glacier was launched August 2012, it was 10x cheaper than S3.
  Now, Amazon S3's Infrequent Access costs 1.25 cents/GB vs Glacier's
  0.7 cents/GB, so S3 is not even twice as much as Glacier

* Other strong competitors (Google Nearline at 1 cent/GB and Backblaze
  B2 at 0.5 cent/GB) have similar services that are simpler to use.

* Glacier has always had a very complex retrieval model.  It's hard to
  program and nearly impossible for most people to understand.
  Downloading from Glacier can lead to very high and unexpected costs,
  not because of the data transfer costs, which are similar to S3, but
  because of separate retrieval costs.

* To avoid high costs, large downloads have to be spread out over a
  very long time (days or weeks) and grouped into batches with 4-5
  hour delays.  Batches have to be evenly sized and sized to match
  your download capacity, and costs are dependent on how much data you
  have stored, how much you have downloaded at other times during the
  month, and how fast these other downloads occurred.

* Sending or retrieving large amounts of data on disk drives (AWS
  Import/Export) is fully supported on S3, but only imports are
  supported for Glacier.  In an emergency, you can't get a huge
  backup shipped to you from Glacier, but can from S3.

* Imports to Glacier from disk drives (ie, mailing it to Amazon) only
  imports an entire disk image, not files.  This is unusable for
  HashBackup, and probably for anything else except keeping images of
  hard drives that are decommissioned.  With S3, individual files are
  imported from hard drives to S3 buckets.  This can be used to
  transfer a HashBackup backup directory to S3.

* S3 has a mechanism to transition data to Glacier after a certain
  time.  But: transitioned data uses a completely different Glacier
  mechanism.  It's best to think of it as a separate animal, half
  Glacier and half S3.  The native Glacier APIs do not work with it,
  yet it is still subject to Glacier's complex retrieval procedures
  and costs.  And S3 APIs do not work with it either, not really,
  because of the 4-hour retrieval delay issue.  Glacier tools won't
  work with transitioned data at all, and S3 tools can't handle it
  either unless they make programming allowances for it.  In summary,
  it's very easy to set up transitioning on Amazon's site, but very
  difficult to recover data after it has been transitioned.

* Once S3 data has been transitioned to Glacier, which is very easy to
  set up, there is no easy way to change your mind and put the data
  back in S3 permanently.  There is no "transition back to S3" feature
  on AWS.  To accomplish this, objects have to be "restored" from
  Glacier to S3 temporarily, using the complex Glacier retrieval model
  and costs, then these temporary S3 objects need to be copied to new,
  "real" S3 objects with the S3 copy API, then the temporary S3
  objects are deleted (or they will automatically timeout and be
  deleted).

* If it still seems like Glacier is a good place for your backups,
  use this Glacier cost estimator  to see how much it could cost to
  retrieve your entire backup from Glacier in the event that a disk
  dies

* If it still seems like Glacier is a good idea, read this blog post by a
  Glacier user who was billed $150 for a 60GB Glacier download (to
  Amazon's credit, they did refund it).

In summary, Glacier was novel when it came out in August of 2012
because of low cost, but now, the slight cost savings don't justify
keeping it in HashBackup.

Glacier Migration Strategy
--------------------------

Moving off Glacier onto another storage provider should not be too
difficult since HB supports multiple destination syncing.  The exact
method varies depending on how your backup is configured:

1. If hb config cache-size-limit is -1 (the default):

   This is the easiest migration.  You have a local copy of all your
   backup files in the backup directory.  To migrate, add a new
   destination to dest.conf (don't reuse Glacier's destination name!)
   and add the keyword "off" to your Glacier destination.  Your next
   backup will send all of your backup files to the new destination.

   If your backup is very large, you can add --maxwait 20h to your
   backup command.  Backup will figure out which files need to be
   sent to the new destination, start sending them, and start your
   backup.  When your backup is finished, it will continue sending
   files for up to 20 hours, then it will stop.  Eventually, all
   files will be sent to the new destination.  You can check this by
   looking at backup's output to see if it is still sending old arc
   files or current files.

2. If hb config cache-size-limit is >= 0 and you have multiple
   destinations:

   In this case, some backup files are in the backup directory while
   others are only on the remote destinations.  Since you have
   backups at Glacier and non-Glacier destinations, one option is to
   just remove the Glacier destination from dest.conf.  You still
   have a remote backup at the other destination.

   If you want to replace the Glacier destination with a new
   destination, the process is:

   a) add the new destination to dest.conf
   b) don't reuse Glacier's destination name!
   c) use hb backup -c backupdir /dev/null to backup a small file

   The backup will trigger a resync, which will download all of your
   files from first non-Glacier destination in dest.conf and upload
   them to your new destination.  If your cache-size-limit is zero,
   the sync will take longer.  Suggest changing it to 5 temporarily,
   to cache 5 arc files, until the sync is finished.

   If you have a huge backup, it may take quite some time to do this
   synchronization, and backups cannot be run during the sync.  If the
   sync is stopped, that's fine: the next backup will continue the
   sync where it left off, but until the sync is finished, you will
   not be able to do backups.  Backup cannot run while this kind of
   "destination-to-destination" sync is running.

3. If hb config cache-size-limit is >= 0 and Glacier is your only
   destination in dest.conf:

   For this setup, you have some backup files in your backup
   directory, but most are only on Glacier.  With Glacier, HB is not
   able to do a normal destination-to-destination sync with the
   backup command because of Glacier's 4-hour delay and pacing
   requirements.

   Instead, you must use the recover command to download the entire
   backup into the local backup directory.  Recover will do the
   necessary delays and pacing for Glacier, and give lots of options
   for how much to spend downloading the backup files.  Basically,
   the more spread out the download, the cheaper the cost.

   BUT, the first step is to set cache-size-limit back to -1.  With
   the Glacier destination still active, do this:

   a) hb config -c backupdir cache-size-limit -1
   b) hb backup -c backupdir /dev/null

   Step b) sends the new config settings to Glacier.  Then:

   c) hb recover -c backupdir

   HB will schedule downloads of all your Glacier backup files.
   During recovery, the nightly backup command will not be able to
   run because the backup directory will be locked.

   After recover completes, you will have all backup files in your
   local backup directory.  Edit dest.conf, add your new
   destination, delete the Glacier destination, and if you want, set
   cache-size-limit back to its previous value.  If you have the
   disk space, it is suggested you keep cache-size-limit set to -1
   since it gives you a quick local copy of the backup and makes
   some operations such as archive packing much more efficient (no
   download is required before packing).

   If you have a large backup, you may want to consider starting an
   EC2 instance (Amazon AWS virtual computer), copying your backup key
   there, and doing the hb recover operation on EC2.  This will give
   you very fast download and upload times between Glacier, EC2 (your
   "computer"), and your new destination, whether it is Amazon S3,
   Google Nearline, Backblaze B2, or something else.  Your EC2
   instance should be in the same region as your Glacier data to
   minimize download costs.  Upload to your destination is usually
   free, though you will need to pay EC2 outgoing bandwidth costs.

   IMPORTANT: while syncing your backup to a new destination with EC2,
   you cannot run backups.  You have to wait until the sync is
   finished.

   After you have finished syncing your new destination with your EC2
   instance, copy dest.conf, dest.db, hb.db* and hb.sig back to your
   regular backup directory (you might want to make a copy of the old
   files first, just in case).  Then you can continue regular backups.

4. To remove the backup stored on Glacier:

   Edit dest.conf, remove the "off" keyword for Glacier, and use
   this command to remove your Glacier backup files:

   a) hb dest -c backup clear glacdest

   where "glacdest" is the name of your Glacier destination.  Then
   you add the "off" keyword again, or remove the Glacier lines in
   dest.conf

5. After your backup is running the way you want, you may still have
    files in Glacier.  Unfortunately, Amazon doesn't make it easy to delete
    these, and HashBackup can only delete the files it has archive ids for.
    If you have made test backups with HB, then deleted the backup directory,
    HB has no way to delete these Glacier files.

    To delete your Glacier vaults, try this tool.  Many have had success with
     it, although when tried on the HashBackup testing vault, it said it deleted the
     files, but they were still there and the vault couldn't be deleted.  Had to
     submit a tech support request to Amazon and have them delete the vaults.

    This tool will have to request an archive list from Glacier, which,
    you guessed it! - takes 4-5 hours.  Then it will delete all the archives, then
    delete your vault.  Or, maybe just complain to Amazon and let them do it!

    Also, HashBackup stores some Glacier files in your S3 account.  These
    are in an S3 bucket named <accountid>-hashbackup-glac-vaults.  You
    can delete this bucket with the S3 control panel at aws.amazon.com.

    Good luck!

Comments