Trim
Finds files and directories with large amounts of unshared data and either generates a report or starts an interactive session to delete files and reduce the backup size.
$ hb trim [-c backupdir] [-i] [-p] [-n toppercent] [-s skipfilesize]
Without options, trim generates a report with 3 sections:
-
versions using more than average backup space
-
files using more than average backup space
-
directories using more than average backup space
Options
-i
starts an interactive session to remove files and/or exclude
them from the backup. Without -i
trim generates a report ordered by
files and directories using the most backup space.
-n toppercent
percentage of paths to check, default is 50 for the top 50%
by file size. Trim starts faster with lower percentages.
-p
sorts by pathname instead of backup space
-s skipfilesize
ignores files smaller than or with less unique
data than skipfilesize
. -s10g
skips files < 10 GB, -s1
is the
same as -s1m
and skips all files < 1 MB. The default is to skip
files below the average file size (or 1 MB, whichever is larger).
Trim starts faster when -s
is larger.
Trim Reporting
Trim first scans the entire backup database to find pathnames that have the most unshared data. This is tricky in a deduplicated backup, because different pathnames may reference common data. Deleting any one pathname will not reduce backup space because that space is still used by the other pathnames.
Trim shows versions, files, and directories using the most unshared
backup space. This is a hard problem even for trim! Instead of
showing filesystem file sizes seen in a Unix ls
list, trim shows how
much backup space files are using after deduplication and compression.
Trim only reports the unshared space used by individual pathnames. If space is shared between different pathnames the shared space is not counted because freeing it would require deleting multiple files.
The directory section shows how much unshared backup space each directory is using based on the sum of unshared space used by large files in the directory. Small files are not counted, so the actual space used by the directory is larger than what trim shows.
Interactive Trim
Sites or individual users with more control over their backup data can
use the -i
option to start an interactive trim session. This
calculates space usage as described in the Trim Reporting section,
then allows entering single-keystroke commands to manipulate the
largest files in the backup.
With interactive trim, the list of files and directories is easily navigated and files and directories can be marked with several states:
-
k
= keep -
d
= delete from backup -
D
= delete from backup and the live filesystem -
x
= delete from backup, add to inex.conf so file isn’t saved again -
X
= delete from backup and the live filesystem, then add to inex.conf
Files and directories marked "keep" have this recorded in the backup database so it is remembered the next time interactive trim is used. This can be overridden of course.
To make trimming faster, an autoskip feature can be enabled and
disabled with the a
keystroke. This will show and skip files
already marked as keep or delete.
After finishing the interactive trim session with the f
key, trim
will ask for confirmation to delete items, then ask again before
committing deletes to the backup database. After the second
confirmation, deletes are permanent.
Examples
First a test directory is created with a few files.
# file1 is a 10MB file of random data (use bs=10M, capital M, on Linux)
$ dd if=/dev/random of=test/file1 bs=10m count=1
1+0 records in
1+0 records out
10485760 bytes transferred in 0.320128 secs (32754902 bytes/sec)
# file2 is a 5MB file of random data
$ dd if=/dev/random of=test/file2 bs=5m count=1
1+0 records in
1+0 records out
5242880 bytes transferred in 0.157587 secs (33269789 bytes/sec)
# file12 is file1 + file2
$ cat test/file1 test/file2 >test/file12
# file1copy is a copy of file1
$ cp test/file1 test/file1copy
Let’s see the test directory:
$ ls -l test
total 81920
-rw-r--r-- 1 jim staff 10485760 Apr 2 17:20 file1
-rw-r--r-- 1 jim staff 15728640 Apr 2 17:21 file12
-rw-r--r-- 1 jim staff 10485760 Apr 2 17:21 file1copy
-rw-r--r-- 1 jim staff 5242880 Apr 2 17:20 file2
Now create a backup of the test directory:
$ hb init -c hb
HashBackup #2876 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Permissions set for owner access only
Created key file /hb/key.conf
Key file set to read-only
Setting include/exclude defaults: /hb/inex.conf
VERY IMPORTANT: your backup is encrypted and can only be accessed with
the encryption key, stored in the file:
/hb/key.conf
You MUST make copies of this file and store them in secure locations,
separate from your computer and backup data. If your hard drive fails,
you will need this key to restore your files. If you have setup remote
destinations in dest.conf, that file should be copied too.
Backup directory initialized
$ hb backup -c hb test
HashBackup #2876 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Backup start: 2022-04-02 17:21:43
Copied HB program to /hb/hb#2876
This is backup version: 0
Dedup enabled, 0% of current size, 0% of max size
/
/hb
/hb/inex.conf
/test
/test/file1
/test/file12
/test/file1copy
/test/file2
Time: 0.9s
CPU: 0.5s, 49%
Mem: 78 MB
Checked: 8 paths, 41943562 bytes, 41 MB
Saved: 8 paths, 41943562 bytes, 41 MB
Excluded: 0
Dupbytes: 25990241, 25 MB, 61%
Compression: 61%, 2.6:1
Efficiency: 53.33 MB reduced/cpusec
Space: +15 MB, 16 MB total
New files using the most space:
10 MB /test/file1
5.3 MB /test/file12
No errors
This shows that even though we have 40MB of data in the test
directory, only 15MB is using backup space. Here’s how trim
shows
this in a report:
$ hb trim -c hb
HashBackup #2882 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Most recent backup version: 0
Dedup loaded, 0% of current size
Skipping files below average or 1MB
Versions:
15 MB 0
Scanning files I
5 files, average file size 8.3 MB, total 41 MB
3 files above 8.3 MB total 36 MB
Scanning files II
0 files total 0 bytes unique bytes
$
Trim has figured out that these files are all sharing data, so removing any one of them will not reduce backup space.
The next example is a backup of /Applications on Mac OSX.
$ hb backup -c hb /Applications -v1
HashBackup #2882 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Backup start: 2022-04-03 21:14:06
Copied HB program to /hb/hb#2882
This is backup version: 0
Dedup enabled, 0% of current size, 0% of max size
Backing up: /Applications
Backing up: /hb/inex.conf00 checked 1.7 GB 1325/sec
Time: 51.7s
CPU: 59.0s, 114%
Mem: 107 MB
Checked: 68014 paths, 1939557240 bytes, 1.9 GB
Saved: 68014 paths, 1939555008 bytes, 1.9 GB
Excluded: 0
Dupbytes: 197630484, 197 MB, 10%
Compression: 53%, 2.2:1
Efficiency: 16.77 MB reduced/cpusec
Space: +901 MB, 901 MB total
New files using the most space:
101 MB /Applications/Firefox.app/Contents/MacOS/XUL
40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma
24 MB /Applications/iTunes.app/Contents/Resources/Assets.car
18 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/Resources/presources.dat
14 MB /Applications/iTunes.app/Contents/MacOS/iTunes
12 MB /Applications/Firefox.app/Contents/Resources/browser/omni.ja
11 MB /Applications/PDFScanner.app/Contents/Frameworks/libopencv_imgproc.dylib
9.7 MB /Applications/Firefox.app/Contents/Resources/omni.ja
9.2 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/MacOS/Pianoteq 7
9.1 MB /Applications/PDFScanner.app/Contents/Resources/tesseract/fin.traineddata
No errors
$
The trim report is:
HashBackup #2882 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Most recent backup version: 0
Dedup loaded, 13% of current size
Skipping files below average or 1MB
Versions:
900 MB 0
Scanning files I
60773 files, average file size 31 KB, total 1.9 GB
243 files above 1.0 MB total 1.2 GB
Scanning files II
71 files total 260 MB unique bytes[K
Files
43 MB /Applications/Firefox.app/Contents/MacOS/XUL
40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma
21 MB /Applications/iTunes.app/Contents/Resources/Assets.car
15 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/Resources/presources.dat
6.4 MB /Applications/iTunes.app/Contents/MacOS/iTunes
6.1 MB /Applications/Firefox.app/Contents/Resources/browser/omni.ja
5.2 MB /Applications/Safari.app/Contents/Resources/Background Images/Safari-Background_Emoji_Safari-Animals.699.png
4.8 MB /Applications/Firefox.app/Contents/Resources/omni.ja
4.5 MB /Applications/PDFScanner.app/Contents/Frameworks/libopencv_imgproc.dylib
4.3 MB /Applications/Safari.app/Contents/Resources/Background Images/Safari-Background_Emoji_Vacation.609.png
4.1 MB /Applications/PDFScanner.app/Contents/Resources/tesseract/fin.traineddata
3.9 MB /Applications/PDFScanner.app/Contents/Frameworks/libopencv_core.dylib
...
1.0 MB /Applications/Books.app/Contents/PlugIns/iBAReaderKit.bundle/Contents/MacOS/iBAReaderKit
1.0 MB /Applications/Safari.app/Contents/Resources/Background Images/Safari-Background_California-Dogface-Butterfly.661.png
Directories
260 MB /Applications
56 MB /Applications/Firefox.app
56 MB /Applications/Firefox.app/Contents
45 MB /Applications/Firefox.app/Contents/MacOS
40 MB /Applications/Install Pianoteq 7.app
40 MB /Applications/Install Pianoteq 7.app/Contents
40 MB /Applications/Install Pianoteq 7.app/Contents/Resources
...
This report (usually much longer!) can be analyzed offline or given to
storage users to figure out which files can be easily removed to save
backup space. The -p
option may be useful for reporting to group
files into the normal directory structure alphabetically.
Here’s a short example of interactive trimming. The keystroke has been added in parenthesis to help understand what’s happening.
$ hb trim -c /hb -i
HashBackup #2883 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Most recent backup version: 0
Dedup loaded, 13% of current size
Skipping files below average or 1MB
Versions:
899 MB 0
Scanning files I
60773 files, average file size 31 KB, total 1.9 GB
243 files above 1.0 MB total 1.2 GB
Scanning files II
116 files total 498 MB unique bytes
----------------------------------------------------------------
Navigation
n = no change, next (Enter) b = no change, back (Backspace)
< = up a directory > = down a directory
123 Enter = goto line 123 a = autoskip toggle
Space = next or prev 25
Changes
k = keep file K = keep everything in directory
d = delete backup D = delete backup & live file
x = delete backup, exclude X = delete backup & live, exclude
- = remove keep or delete / = toggle directory slash
Other
l = list directory s = show deletes
h = help f = finished
q, ctrl-c, ctrl-d = quit without doing any deletes.
IMPORTANT: nothing is changed without confirmation.
----------------------------------------------------------------
Files
1. 101 MB /Applications/Firefox.app/Contents/MacOS/XUL (n) next
2. 40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma (n) next
3. 25 MB /Applications/iTunes.app/Contents/Resources/Assets.car (b) back
2. 40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma (<) up
Directories
CAREFUL! Deleting live directories will delete all
contents, including files not in the backup.
135. 40 MB /Applications/Install Pianoteq 7.app/Contents/Resources (<) up
134. 40 MB /Applications/Install Pianoteq 7.app/Contents (<) up
133. 40 MB /Applications/Install Pianoteq 7.app (d) delete backup
Files
2. 40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma parent Install Pianoteq 7.app deleted (n) next
3. 25 MB /Applications/iTunes.app/Contents/Resources/Assets.car (n) next
4. 18 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/Resources/presources.dat (f) finish
Directories
117. 498 MB /Applications (n) next
118. 126 MB /Applications/Firefox.app (n) next
119. 126 MB /Applications/Firefox.app/Contents (n) next
120. 104 MB /Applications/Firefox.app/Contents/MacOS (n) next
121. 71 MB /Applications/Kofax Power PDF for Mac.app (d) delete backup
122. 71 MB /Applications/Kofax Power PDF for Mac.app/Contents parent Kofax Power PDF for Mac.app deleted (f) finish
Files to be deleted:
121. 71 MB /Applications/Kofax Power PDF for Mac.app delete backup
133. 40 MB /Applications/Install Pianoteq 7.app delete backup
-> 111 MB backup space to delete
Delete these? y
Remove backup: /Applications/Kofax Power PDF for Mac.app
Remove backup: /Applications/Install Pianoteq 7.app
Commit deletes? y
Packing archives
Packing arc.0.2 into arc.0.9
Packing arc.0.3 into arc.0.9
Packing arc.0.4 into arc.0.9
Mem: 75 MB
Removed: 289 MB, 1668 files, 2 arc files
Space: -167 MB, 748 MB total
$
The backup space removed is higher than reported by trim
because
trim
only considers large files for interactive trimming. When a
directory is removed, all files are removed.
HashBackup manages backup space by periodically packing arc files to
remove empty space. Depending on how the pack-
config keywords are
setup, backup space may not be removed immediately following a trim,
but will be removed the next time arc files are packed.