Exclude Details
HashBackup uses a text file inex.conf to list files and directories
to exclude from the backup. It’s typical to exclude temporary files,
unimportant files, browser cache directories, database files, VM
memory image files, and swap or paging files. It may seem surprising
that database files are not backed up, but to get a usable backup
databases are usually dumped with a database utility while a
transaction read lock is held and then the dump is backed up, not the
actual database files.
Exclude File Format
The exclude file is a simple text file with each line being:
-
a blank line which is ignored
-
a comment line beginning with #, also ignored
-
a rule line
Each rule line begins with a rule type followed by a rule pattern.
The rule type can be a single letter up to the full name of the rule
type. So the exclude rule type can be abbreviated as e, ex,
exc, etc. There are 4 rule types, explained below. Each rule
pattern following the rule type describes a filename or pathname to be
excluded. Every rule line must have both a rule type and rule pattern
with the exception of a rule line beginning with /. This is a
shorthand for the x rule type to allow using absolute pathnames,
perhaps from the find command, as exclude rules.
HashBackup has very efficient wildcard support and exclude files containing thousands of rules, even with wildcards, will not cause performance issues. Sites often uses automated tools to generate exclude rules or share exclude files between multiple users, causing very large lists of exclude rules.
| Exclude rules are evaluated during backups with a regular expression engine that does not handle more than three star wildcards in one rule very efficiently. It works, but may affect backup performance. |
Initial Exclude File
When a backup directory is created with hb init, a default rule file
is also created. This file is system-specific for the Mac and Linux
and contains 10-15 rules that apply to those systems. If the exclude
file is every deleted, the backup command will install a new default
rule file. To have no exclusions, use an empty rule file or one with
only comments.
Example Exclude File (Linux):
g /home/*/.cache/
g /home/*/.gvfs
g *.vmem
x /proc/
x /tmp/
x /var/tmp/
Filenames vs Directory Names
The rule pattern that follows the rule type can describe either filenames or directories. If there is no slash in the pattern, it will match a file or directory anywhere in the filesystem.
x xclude Rule
The x rule type is for non-wildcard patterns. Any characters are
allowed in the pattern that are legal in pathnames, without special
meaning, so a star pattern character matches a star in the pathname.
A pattern without slashes will match anywhere in the filesystem, for example:
x abc.tmp
matches all abc.tmp files and directories anywhere.
An x pattern ending in slash specifies a directory’s contents.
Sometimes it is useful to backup a directory so that it is re-created
on restore, but it is not necessary to backup the directory contents.
If a directory pattern ends with /, then the directory itself is saved
but not the contents. If a pattern doesn’t end with /, then neither
the directory nor its contents are saved. For example:
x /home/jim/.cache/
x /home/jim/.cache
The first rule will save an empty .cache directory without its contents. The second rule will not save anything about the .cache directory.
Absolute pathnames beginning with a slash can be used as shorthand for
the x rule, without any rule type. This could come from the Unix
find command. For example, note the missing x in:
/home/jim/abc
/home/jim/def
g glob Rule
The g rule type is the most commonly used and allows wildcards in
the pattern, making it possible to exclude a file for all users,
exclude all temporary files, all picture files, etc. There are 3
wildcards in glob rules:
-
? means any character
-
* means zero or more characters other than slash
-
** means zero or more characters including slash
-
/**/ means zero or more directories
-
[abc] means one character either a, b, or c
-
[!abc] means one character not a, b, nor c
-
[a-zA-J] means one character a-z or A-J
As with x rules, glob patterns without slashes match anywhere, for
example:
g *.tmp
g abc.xyz
These rules would exclude any file or directory with a .tmp extension
anywhere in the filesystem and exclude the file or directory abc.xyz
anywhere in the filesystem. An x rule could have been used for the
2nd rule. The main reason to use an x rule is if the pattern
contains special characters that should not be a wildcard, for
example, the pathname contains star, bracket, or question mark
characters.
A typical use of glob patterns is to match for all users. For example, on Linux:
g /home/*/.cache/
g /home/**/.cache/
The first rule excludes the .cache directory contents for all users. It only applies to .cache directories directly beneath the users' home directories. The 2nd rule excludes .cache directories anywhere under /home. To exclude all .cache directory contents anywhere, these rules are all equivalent:
g .cache/
g .cache/*
g */.cache/
g */.cache/*
g .cache/**
g **/.cache/
g /**/.cache/**
r regex Rule
Regular expression patterns allow more complex matching. The special characters are:
. matches one character
^ matches the beginning of the pathname
$ matches the end of the pathname
* matches zero or more of the previous character or group
+ matches one or more of the previous character or group
? makes the previous character or group optional
[abc] matches one character a, b, or c
[^abc] matches one character not a, b, nor c
[a-zA-J] matches one character a-z or A-J
\ next character has no special meaning, so \* matches * character
abc|def matches either abc or def
( start of a group, so (abc)? matches nothing or abc
) end of a group
{m,n} matches m to n repeats of the previous character or group
Regular patterns must always match the beginning of the pathname, and
pathnames passed to the exclude handler are always absolute (begin
with a slash), so it’s important that r patterns either begin with
a slash or have a wildcard at the beginning. For example:
r .cache/ will not match anything
r .*/.cache/ matches .cache contents anywhere
r /home/.*/.cache/ matches .cache contents anywhere under /home
r /home/[^/]+/.cache/ matches .cache contents directly under user directories
Regular expressions do not have to match the complete pathname. To make sure a regex matches the whole pathname, use $ at the end.
Regular expresssion are powerful but can be confusing, so makes sure
to test them carefully. The -v4 option to hb backup shows all
excluded paths and can be useful for testing exclude rules.
e exclude Rule
The e rule is an obsolete form of the g rule and is only preserved
for compatibility with earlier releases. e rules are very similar
to g rules except for the handling of wildcards: e rule \* and ?
wildcards match across slashes whereas g rule \* and ? do not. So
for the rules:
e /home/*/.cache/
g /home/*/.cache/
g /home/**/.cache/
The first rule matches .cache contents anywhere under /home user directories, but not /home/.cache/. The 2nd rule matches .cache contents directly under users' directories. The 3rd rule matches .cache contents anywhere, even /home/.cache/, because /**/ can match zero directories.