Exclude Details

HashBackup uses a text file inex.conf to list files and directories to exclude from the backup. It’s typical to exclude temporary files, unimportant files, browser cache directories, database files, VM memory image files, and swap or paging files. It may seem surprising that database files are not backed up, but to get a usable backup databases are usually dumped with a database utility while a transaction read lock is held and then the dump is backed up, not the actual database files.

Exclude File Format

The exclude file is a simple text file with each line being:

  • a blank line which is ignored

  • a comment line beginning with #, also ignored

  • a rule line

Each rule line begins with a rule type followed by a rule pattern. The rule type can be a single letter up to the full name of the rule type. So the exclude rule type can be abbreviated as e, ex, exc, etc. There are 4 rule types, explained below. Each rule pattern following the rule type describes a filename or pathname to be excluded. Every rule line must have both a rule type and rule pattern with the exception of a rule line beginning with /. This is a shorthand for the x rule type to allow using absolute pathnames, perhaps from the find command, as exclude rules.

HashBackup has very efficient wildcard support and exclude files containing thousands of rules, even with wildcards, will not cause performance issues. Sites often uses automated tools to generate exclude rules or share exclude files between multiple users, causing very large lists of exclude rules.

Exclude rules are evaluated during backups with a regular expression engine that does not handle more than three star wildcards in one rule very efficiently. It works, but may affect backup performance.

Initial Exclude File

When a backup directory is created with hb init, a default rule file is also created. This file is system-specific for the Mac and Linux and contains 10-15 rules that apply to those systems. If the exclude file is every deleted, the backup command will install a new default rule file. To have no exclusions, use an empty rule file or one with only comments.

Example Exclude File (Linux):

g /home/*/.cache/
g /home/*/.gvfs
g *.vmem
x /proc/
x /tmp/
x /var/tmp/

Filenames vs Directory Names

The rule pattern that follows the rule type can describe either filenames or directories. If there is no slash in the pattern, it will match a file or directory anywhere in the filesystem.

x xclude Rule

The x rule type is for non-wildcard patterns. Any characters are allowed in the pattern that are legal in pathnames, without special meaning, so a star pattern character matches a star in the pathname.

A pattern without slashes will match anywhere in the filesystem, for example:

x abc.tmp

matches all abc.tmp files and directories anywhere.

An x pattern ending in slash specifies a directory’s contents. Sometimes it is useful to backup a directory so that it is re-created on restore, but it is not necessary to backup the directory contents. If a directory pattern ends with /, then the directory itself is saved but not the contents. If a pattern doesn’t end with /, then neither the directory nor its contents are saved. For example:

x /home/jim/.cache/
x /home/jim/.cache

The first rule will save an empty .cache directory without its contents. The second rule will not save anything about the .cache directory.

Absolute pathnames beginning with a slash can be used as shorthand for the x rule, without any rule type. This could come from the Unix find command. For example, note the missing x in:

/home/jim/abc
/home/jim/def

g glob Rule

The g rule type is the most commonly used and allows wildcards in the pattern, making it possible to exclude a file for all users, exclude all temporary files, all picture files, etc. There are 3 wildcards in glob rules:

  • ? means any character

  • * means zero or more characters other than slash

  • ** means zero or more characters including slash

  • /**/ means zero or more directories

  • [abc] means one character either a, b, or c

  • [!abc] means one character not a, b, nor c

  • [a-zA-J] means one character a-z or A-J

As with x rules, glob patterns without slashes match anywhere, for example:

g *.tmp
g abc.xyz

These rules would exclude any file or directory with a .tmp extension anywhere in the filesystem and exclude the file or directory abc.xyz anywhere in the filesystem. An x rule could have been used for the 2nd rule. The main reason to use an x rule is if the pattern contains special characters that should not be a wildcard, for example, the pathname contains star, bracket, or question mark characters.

A typical use of glob patterns is to match for all users. For example, on Linux:

g /home/*/.cache/
g /home/**/.cache/

The first rule excludes the .cache directory contents for all users. It only applies to .cache directories directly beneath the users' home directories. The 2nd rule excludes .cache directories anywhere under /home. To exclude all .cache directory contents anywhere, these rules are all equivalent:

g .cache/
g .cache/*
g */.cache/
g */.cache/*
g .cache/**
g **/.cache/
g /**/.cache/**

r regex Rule

Regular expression patterns allow more complex matching. The special characters are:

. matches one character
^ matches the beginning of the pathname
$ matches the end of the pathname
* matches zero or more of the previous character or group
+ matches one or more of the previous character or group
? makes the previous character or group optional
[abc] matches one character a, b, or c
[^abc] matches one character not a, b, nor c
[a-zA-J] matches one character a-z or A-J
\ next character has no special meaning, so \* matches * character
abc|def matches either abc or def
( start of a group, so (abc)? matches nothing or abc
) end of a group
{m,n} matches m to n repeats of the previous character or group

Regular patterns must always match the beginning of the pathname, and pathnames passed to the exclude handler are always absolute (begin with a slash), so it’s important that r patterns either begin with a slash or have a wildcard at the beginning. For example:

r .cache/              will not match anything
r .*/.cache/           matches .cache contents anywhere
r /home/.*/.cache/     matches .cache contents anywhere under /home
r /home/[^/]+/.cache/  matches .cache contents directly under user directories

Regular expressions do not have to match the complete pathname. To make sure a regex matches the whole pathname, use $ at the end.

Regular expresssion are powerful but can be confusing, so makes sure to test them carefully. The -v4 option to hb backup shows all excluded paths and can be useful for testing exclude rules.

e exclude Rule

The e rule is an obsolete form of the g rule and is only preserved for compatibility with earlier releases. e rules are very similar to g rules except for the handling of wildcards: e rule \* and ? wildcards match across slashes whereas g rule \* and ? do not. So for the rules:

e /home/*/.cache/
g /home/*/.cache/
g /home/**/.cache/

The first rule matches .cache contents anywhere under /home user directories, but not /home/.cache/. The 2nd rule matches .cache contents directly under users' directories. The 3rd rule matches .cache contents anywhere, even /home/.cache/, because /**/ can match zero directories.