Commands‎ > ‎

Retain

Keeps selected versions of files in your backup history to implement a backup file retention policy, and removes files that are not needed according to your policy.  Retain will never remove the most recent backup for an active (not deleted) file.

  $ hb retain [-c backupdir] [-s sched or -t time]
             [-m copies] [-x time] [-v] [--dryrun]
             [pathnames ...]


The -s, -t, -m and -x options are most commonly used.  The other options, including pathnames, are special purpose:

 -v                 show files that are deleted and the reason each was removed
 --dryrun    displays information about what would be retained and deleted, without doing anything

There are two main ways to specify a retention policy:

  -s retain by schedule
  -t retain by time

The -s and -t options cannot be used together.  If neither is used, the default is -t all, which means to retain all versions (options -m and -x will still cause files to be removed).

The -m and -x options can be used in combination with either -s or -t to further limit the files retained:

  -m limit the maximum number of copies of each file
  -x limit the retention time of deleted files

Be careful with -
m and -x because they override your primary retention policy; read on for more details.



Retain, Versions, "Holes", and Recovering Backup Space

Some backup programs do retention by versions: they delete old backup versions entirely.  HashBackup does retention by files: when there are too many versions of a file, it deletes individual versions of that file.  Versions do not disappear from a version listing unless rm -r is used to delete an entire version, which doesn't happen in normal operation.

As an example, if you backup file A in version 0 and it stays on your filesystem but never changes, it is never backed up again.  The only version containing file A is version 0.

When retain and rm delete files, it creates "holes" or empty space in arc files.  When this empty space becomes too big, HB compacts the arc file by creating a new arc file without the empty space.  This compacting process is controlled by several config options beginning with pack-.  See the config page for details.

Retention by time: -t

The -t time option is used when your goal is to preserve all backup history within a certain time.  The time argument specifies how far back to keep versions, from the most recent backup.  For example, 2y means the last 2 years and 4m means the last 4 months.  Time may be any one of:

  Ny = the last N years
  Nq = the last N quarters (120 days)
  Nm = the last N months (30 days)
  Nw = the last N weeks
  Nd = the last N days
  Nh = the last N hours
  Nn = the last N minutes
  Ns = the last N seconds (used for testing)
  all = retain all backups (this is the default)

The advantage of the -t option is that it allows you to restore files from any time within your retention period.  It is the safest retention method and easy to understand.  The disadvantage is that more versions (copies) of the same file may be kept, especially frequently modified files, so more backup disk space is used.

Retention by schedule: -s

Retaining files by schedule is a more sophisticated option.  It is used when you want to save several recent versions of files, and some, but not all, older versions.  For example, if you are doing daily backups, you may want to save the last 7 backups to be able to restore any files changed within the past week.  You also may want to keep one backup per month for the last 3 months in case there is a problem you didn't notice within 7 days.  You could use -t 3m to keep all backups made within the last 3 months, but that would keep 90 daily backups.  Using -s 7d3m will save the last 7 daily backups, plus 1 backup per month for the last 3 months - 10 backups altogether.

The -s retension option works by examining each file's history and deciding which versions of that file to keep and which to remove based on your retention schedule.  It does not save or remove entire backup versions.

-s Examples

7d - retain 1 backup from each of the last 7 days.  If you had performed backups every hour for a week, some versions of files would be removed leaving you with 1 version per day for the last 7 days.  An easier way to accomplish this would be to only do backups once per day, but perhaps you need hourly backups while a project is in certain very active stages, then a daily backup is okay when things calm down and files are changing less often.

7d4w - similar to above, but also retains 1 weekly backup for each of the last 4 weeks.  This provides more history in case a version of a file is needed that is more than 7 days old.  With this policy, there would be 11 versions retained.

28d - in contrast to 7d4w, which retains 11 versions, a 28d retention policy would retain 1 backup for each of the last 28 days.

7d4w12m - retain 1 backup from each of the last 7 days, 1 from each of the last 4 weeks, and 1 from each of the last 12 months, for a total of 23 prior backups for the year.

365d - retain 1 backup from each of the last 365 days, for a total of 365 prior backups.  This policy could be used for data such as security log files, where a specific day might be needed and a more coarse retention policy such as 7d4w12m would not allow the restore of a specific day from a few months ago.

7d4w3m4q5y - retain 1 backup from each of the last 7 days, 1 from each of the last 4 weeks, 1 from each of the last 3 months, 1 from each of the last 4 quarters (12 weeks), and 1 from each of the last 5 years.  This is a very safe retention policy that uses less disk space than -t 5y.  This can be selected with -s safe.

For -s, the retension options must be listed in order, from shortest time period to longest, so 5y7d will not work: it must be 7d5y.

The retention schedule options are:

  Xn = X minutes
  Xh = X hours
  Xd = X days
  Xw = X weeks
  Xm = X months
  Xq = X quarters
  Xy = X years
  safe = 7d4w3m4q5y

It is easy to confuse the meaning of retention policies.  For example, a policy of -s 3m means to retain 1 backup from each of the last 3 months.  It does not mean to retain all backups made in the last 3 months; that would be specified with -t 3m.  Even if you made backups every hour, a policy of -s 3m would leave you with at most 3 prior backups of each file, one for each of the last 3 months.  By contrast, a policy of -s 90d would retain 1 backup from each of the last 90 days, leaving you with a daily history for the prior 3 months.  A policy of -s 1825d (365 times 5) would retain a daily backup version from the last 5 years, whereas -s 5y would retain only 5 prior versions, 1 from each of the last 5 years.

TIP: An easy way to remember how -s retention policies work is to realize that -s retentions will keep the same number of prior versions of a file as the sum of the numbers in your policy: for 7d4m you will have 11 prior versions.
          
Retention of deleted files: -x

The -x option is used to specify a different retention time for files that have been deleted.  For example, if you have been backing up files for the last 2 years and still have them on your computer, you may want to keep the entire 2 year history, so you would use -t 2y.  But if you deleted some files a year ago, you may not want to keep their backup history for 2 years; maybe keeping deleted files in your backup history for 3 months is enough, just in case you delete a file by mistake.  Use -t 2y -x 3m to remove deleted files after 3 months. 

If -x is not specified, it defaults to the same value as -t, or the longest period with -s.  So, for -s 7d4w3y, the default for -x would be 3y.

If your backup seems to be "too big", check to make sure you used -x with your retain command, because deleted files can accumulate quickly.  For example, the HashBackup development server had a retention of -s30d12m (keep last 30 days plus 1 backup per month).  Adding -x3m to remove files deleted more than 3 months ago trimmed 1/3rd of the backup space - 36GB.  Keeping every deleted file for 12 months can use a lot of backup space!

Retention by number of copies: -m


The -m option is simple to understand: the most recent N copies of each file are retained, and any older copies are deleted.  This option can be used with either -s or -t.

You can use -x and/or -m without using -s or -t, because -t all, keep all versions, is the default retention option.

IMPORTANT: if -m or -x is used, it may limit your ability to restore older versions.  For example, if you use the -t 1y option, HashBackup keeps all files backed up within the last year.  But if you add -m 3 to this, then if more than 3 versions of a file were saved in the last year, only the most recent 3 will be kept; you will lose the ability to restore the older files, even though they were backed up within the last year.

Retention with pathnames

Pathnames can be used with retain so that it operates on just these files or directories.  This is useful when you want to retain fewer copies of certain parts of the backup.  For example, if you backup /home with several users, each user could have a different retention policy by running retain with /home/user1, then /home/user2, etc.  As another example, you may have a log directory where you only want to keep the last 30 days, even though for the rest of the backup, you want to keep 1 year of backups.

It's a good idea to continue running retain on the entire backup, without a pathname, unless you are sure that the specific retains will cover all parts of the backup.

NOTES:
  1. The retention time is always computed based on the most recent backup's completion time rather than the current time.  This can be confusing, but it avoids the situation where running retain repeatedly eventually deletes all backup history.
  2. The most recent version of an active (not deleted) file is never deleted by retain. 
  3. Retain may sometimes keep 1 version more than you expect.  This example illustrates why:
    • backup a file at 3:00
    • change the file
    • back it up again at 5:00
    • run retain -t1h at 5:30
    • the 3:00 version will not be removed, because your retention policy is to retain 1 hour of backups.  Your last backup was at 5:00, so 1 hour before that is 4:00.  The 3:00 version is still within your retention time of 1 hour, because it is the state of the file at 4:00
    • if you re-run retain after 6:00, the 3:00 version would still not be removed, because retain -t 1h always goes 1 hour before the last backup; the time you run retain does not matter.  The reason for this is that if you continued running retain without doing backups, eventually your backup history would disappear.
    • if you run a backup at 6:00 and follow that with retain -t1h, the 3:00 version would be deleted because of the backup at 5:00
    • on a monthly scale, assume you have a retention time of 30 days, more or less, and a January 1st and February 1st backup.  The January 1st backup will not be removed until March 1st, to give you a complete 30 day retention history. Though confusing, this is correct behavior: if the January 1st backup were deleted when it was 30 days old, then when you did a backup and retain on February 1st, the January backup would be removed and you would have only 1 day of backup history. 
EXAMPLE

Here is a typical example of a HashBackup file retention session.  The command is:

$ hb retain -c /hb -s 7d4w12m -x7d

This will keep one backup for each of the last 7 days, one for each of the last 4 weeks, and one for each of the last 12 months.  Files deleted in the last 7 days will be kept; files deleted more than 7 days ago from the live system will be removed from the backup.

HashBackup build 256 (C) HashBackup, LLC
Backup directory: /hb
Most recent backup version: 14
Backup finished at: 2010-06-29 12:35:01
Retention schedule: 7d4w12m
Deleted file retention time: 7d (keep files since 2010-06-22 12:35:01)
33890 files deleted, 218184 files retained
Writing archive 0.0
Writing archive 3.0
Writing archive 4.0
Writing archive 5.0
Compressing archive 5.0
Writing archive 6.0
Writing archive 8.0
Compressing archive 8.0
Writing archive 9.0
Compressing archive 9.0
Writing archive 11.0
Compressing archive 11.0
Writing archive 12.0
Writing archive 13.0
Copied hb.db to g5rsync
Copied arc.5.0 to g5rsync
Copied arc.8.0 to g5rsync
Copied arc.9.0 to g5rsync
Copied arc.11.0 to g5rsync
Copied dest.db to g5rsync

real    2m39.243s
user    2m0.172s
sys    0m14.242s

Notice that not all archives are compressed.  HashBackup compresses archives that have had around 50% of their data removed (configurable with the pack-percent-free config keyword).  This is to balance performance, disk space usage, and bandwidth.  For example, if one small file is removed from a 1GB archive file, it doesn't make sense to spend time compressing it or use bandwidth to transmit it to the remote server for such a small space reduction.
Comments