Retain
Keeps selected versions of files in your backup history to implement a backup file retention policy and removes files that are not needed according to your policy. Retain will never remove the most recent backup of an active (not deleted) file. If you want to keep all files in your backup there is no need to run retain.
$ hb retain [-c backupdir] [-s sched] [-t time]
[-m copies] [-x time] [-vN] [-rN] [-n]
[pathnames ...]
The -s
, -t
, -m
and -x
options are most commonly used. The
other options, including pathnames, are special purpose:
-v0
don’t show pathnames from command line
-v1
shows files deleted
-v2
also shows files kept, with the reason each file was removed or kept
-rN
apply retention policy only to backup versions N
and below
-n
displays information about what would be retained and deleted, without doing anything
There are two options to specify a retention policy:
-s
retain by schedule
-t
retain by time
The -s
and -t
options can be used together: first -t
is used to
keep all files saved recently within the -t
time interval, then -s
is used after that. So -t
has the effect of delaying the start of
-s
. If neither is used, the default is to retain all files, though
options -m
and -x
may also cause files to be removed.
The -m
and -x
options further limit the files retained:
-m
limit the maximum number of copies of each file
-x
limit the retention time of deleted files
Be careful with -m
and -x
because they override your -s
and -t
retention policy; read on for more details.
Retain, Versions, "Holes", and Recovering Backup Space
Some backup programs do retention by versions, deleting entire versions of old backups. HashBackup does retention by files: when there are too many versions of a file, it deletes individual versions of that file.
As an example, if you backup file A in version 0 and it stays on your filesystem but never changes, it is never backed up again. The only version containing file A is version 0.
When retain
and rm
delete files, they create "holes" or empty
space in arc files. When this empty space becomes too big, HB
compacts the arc file by creating a new arc file without the empty
space. This packing process is controlled by several config
options
beginning with pack-
. See the Config page
for details.
Retention by time: -t
The -t time
option is used when your goal is to preserve all backup history within a certain time. The time
argument specifies how far back to keep versions from the finish of the most recent backup. For example, 2y
means the last 2 years and 4m
means the last 4 months. Time
may be any one of:
Ny
= the last N
years
Nq
= the last N
quarters (120 days)
Nm
= the last N
months (30 days)
Nw
= the last N
weeks
Nd
= the last N
days
Nh
= the last N
hours
Nn
= the last N
minutes (note the use of n
not m
for minutes)
Ns
= the last N
seconds (used for testing)
The advantage of the -t
option is that it allows you to restore
files from any time within your retention period and is easy to
understand. The disadvantage is that more versions (copies) of the
same file may be kept, especially with frequently modified files and
frequent backups, so more backup disk space is used.
Retention by schedule: -s
Retaining files by schedule is a more sophisticated option. It is
used to save several recent versions of files, and some, but not all,
older versions. For example, when doing daily backups you may want to
save the last 7 backups to be able to restore any files changed within
the past week. You also may want to keep one backup per month for the
last 3 months in case there is a problem you didn’t notice within 7
days. You could use -t 3m
to keep all backups made within the last
3 months, but that would keep 90 daily backups. Using -s 7d3m
will
save the last 7 daily backups, plus 1 backup per month for the last 3
months - 10 backups altogether.
The -s
retension option works by examining each file’s history and
deciding which versions of that file to keep and which to remove based
on your retention schedule. It does not save or remove entire backup
versions.
At Most vs At Least: retain-extra-versions
The retention schedule -s1m
can be interpreted different ways, either:
-
Keep 1 version at most 1 month old. This version could be 1 day old or 30 days old, so it does not guarantee a full 30-day recovery window. For this, set the config option
retain-extra-versions
toFalse
. -
Keep a version at least 1 month old for a longer recovery window. The config option
retain-extra-versions
is set toTrue
, adding 1 to each-s
time interval.-s1m
becomes-s2m
so the recovery window is 30-60 days. This is the default.
Stacked vs Overlapping Retention Periods
The retention schedule -s7d4w3m
can be interpreted different ways:
-
First keep 7 daily backups, plus 4 weekly backups, plus 3 monthly backups. The intervals are "stacked", so the total retention time period is 4 months and 7 days from the finish of the most recent backup. This is how retain actually operates. If
retain-extra-versions
is set, the actual schedule is-s8d5w4m
and the total retention period is nearly 7 months. -
If you expect a maximum 3 month retention period with -s7d4w3m, then one of the 7 daily backups is also a weekly backup and one of the 4 weekly backups is a monthly backup. So use a schedule of
-s7d3w2m
Examples
-s7d
- retain 1 backup from each of the last 7 days. If you had
performed backups every hour for a week, some versions of files would
be removed leaving you with 1 version per day for the last 7 days. An
easier way to accomplish this would be to only do backups once per
day, but perhaps you need hourly backups while a project is in certain
very active stages, then a daily backup is okay when things calm down
and files are changing less often.
-s7d4w
- similar to above, but also retains 1 weekly backup for each
of the last 4 weeks. This provides more history in case a version of
a file is needed that is more than 7 days old. With this policy,
there would be 11 versions retained.
-s28d
- in contrast to 7d4w
, which retains 11 versions, a 28d
retention policy would retain 1 backup for each of the last 28 days.
-s7d4w12m
- retain 1 backup from each of the last 7 days, 1 from
each of the last 4 weeks, and 1 from each of the last 12 months, for a
total of 23 prior backups for the year.
-s365d
- retain 1 backup from each of the last 365 days, for a total
of 365 prior backups. This policy could be used for data such as
security log files, where a specific day might be needed and a more
coarse retention policy such as 7d4w12m
would not allow the restore
of a specific day from a few months ago.
-s7d4w3m4q5y
- retain 1 backup from each of the last 7 days, 1 from
each of the last 4 weeks, 1 from each of the last 3 months, 1 from
each of the last 4 quarters (12 weeks), and 1 from each of the last 5
years. This is a very safe retention policy that uses less disk space
than -t 5y
. This can be selected with -s safe
.
For -s
, the retension options must be listed in order, from shortest
time period to longest, so 5y7d
will not work: it must be 7d5y
.
The retention schedule options are:
Nn
= N
minutes
Nh
= N
hours
Nd
= N
days
Nw
= N
weeks
Nm
= N
months
Nq
= N
quarters
Ny
= N
years
safe
= 7d4w3m4q5y
Note that minutes are specified with n
and months are specified with m
.
It is easy to confuse the meaning of retention policies. For example,
a policy of -s3m
means to retain 1 backup from each of the last 3
months. It does not mean to retain all backups made in the last 3
months; that would be specified with -t3m
. Even if you made
backups every hour, a policy of -s3m
would leave you with at most 3
prior backups of each file, one for each of the last 3 months. By
contrast, a policy of -s90d
would retain 1 backup from each of the
last 90 days, leaving you with a daily history for the prior 3 months.
A policy of -s1825d
(365 times 5) would retain a daily backup
version from the last 5 years, whereas -s5y
would retain only 5
prior versions, 1 from each of the last 5 years.
-s retention policies keep the same number of prior versions of
a file as the sum of the numbers in your policy: for 7d4m you will
have 11 prior versions, assuming you are doing backups at least as
frequently as your shortest retention period.
|
Retention of deleted files: -x
The -x
option is used to specify a different retention time for
files that have been deleted. For example, if you have been backing
up files for the last 2 years and still have them on your computer,
you may want to keep the entire 2 year history, so you would use
-t2y
. Or you don’t need to run retain at all to keep all files.
But if you deleted some files a year ago, you may not want to keep
their backup history for 2 years; maybe keeping deleted files in your
backup history for 3 months is enough, just in case you delete a file
by mistake. Use -t2y -x3m
to remove deleted files after 3 months.
If -x
is not specified, it defaults to the sum of intervals for -t
and -s
. For -t1w -s7d4w
, the default for -x
would be 6 weeks.
If your backup seems to be "too big", check to make sure you used -x
with your retain command, because deleted files can accumulate
quickly. For example, the HashBackup development server had a
retention of -s30d12m
(keep last 30 days plus 1 backup per month for
a year). Adding -x3m
to remove files deleted more than 3 months ago
trimmed 1/3rd of the backup space - 36GB. Keeping every deleted file
for 12 months can use a lot of backup space!
Retention by number of copies: -m
The -m
option is simple to understand: the most recent N copies of
each file are retained, and any older copies are deleted. This option
can be used with either -s
or -t
.
You can use -x
and/or -m
without using -s
or -t
.
If -m or -x are used, it may limit your ability to
restore older versions. For example, if you use the -t1y option,
HashBackup keeps all files backed up within the last year. But if you
add -m3 to this, then if more than 3 versions of a file were saved
in the last year, only the most recent 3 will be kept; you will lose
the ability to restore the older files, even though they were backed
up within the last year.
|
Retention With Pathnames
Pathnames can be used with retain so that it operates on just these
files or directories. This is useful when you want to retain fewer
copies of certain parts of the backup. For example, if you backup
/home
with several users, each user could have a different retention
policy by running retain with /home/user1
, then /home/user2
, etc.
As another example, you may have a log directory where you only want
to keep the last 30 days even though for the rest of the backup you
want to keep 1 year of backups.
It’s a good idea to continue running retain on the entire backup, without a pathname, unless you are sure that the path-specific retains will cover all parts of the backup.
NOTES:
-
The retention time is always computed based on the most recent backup’s completion time rather than the current time. This can be confusing, but it avoids the situation where running retain repeatedly eventually deletes all backup history.
-
The most recent version of an active (not deleted) file is never deleted by retain.
-
Retain may sometimes keep 1 version more than you expect. The config option
reta-extra-versions
controls this and defaults toTrue
.
Example Retain Session
Here is a typical example of a HashBackup file retention session. The command is:
$ hb retain -c /hb -s 7d4w12m -x7d
This will keep one backup for each of the last 7 days, one for each of the last 4 weeks, and one for each of the last 12 months. Files deleted in the last 7 days will be kept; files deleted more than 7 days ago from the live system will be removed from the backup.
HashBackup build 256 (C) HashBackup, LLC
Backup directory: /hb
Most recent backup version: 14
Backup finished at: 2010-06-29 12:35:01
Retention schedule: 7d4w12m
Deleted file retention time: 7d (keep files since 2010-06-22 12:35:01)
33890 files deleted, 218184 files retained
Writing archive 0.0
Writing archive 3.0
Writing archive 4.0
Writing archive 5.0
Compressing archive 5.0
Writing archive 6.0
Writing archive 8.0
Compressing archive 8.0
Writing archive 9.0
Compressing archive 9.0
Writing archive 11.0
Compressing archive 11.0
Writing archive 12.0
Writing archive 13.0
Copied hb.db to g5rsync
Copied arc.5.0 to g5rsync
Copied arc.8.0 to g5rsync
Copied arc.9.0 to g5rsync
Copied arc.11.0 to g5rsync
Copied dest.db to g5rsync
real 2m39.243s
user 2m0.172s
sys 0m14.242s
Notice that not all archives are compressed. HashBackup compresses
archives that have had around 50% of their data removed (configurable
with the pack-percent-free
config keyword). This is to balance
performance, disk space usage, and bandwidth. For example, if one
small file is removed from a 1GB archive file, it doesn’t make sense
to spend time compressing it or using bandwidth to transmit it to the
remote server for such a small space reduction.