Trim

Finds files and directories with large amounts of unshared data and either generates a report or starts an interactive session to delete files and reduce the backup size.

$ hb trim [-c backupdir] [-i] [-p] [-n toppercent] [-s skipfilesize]

Without options, trim generates a report with 3 sections:

  • versions using more than average backup space

  • files using more than average backup space

  • directories using more than average backup space

Options

-i starts an interactive session to remove files and/or exclude them from the backup. Without -i trim generates a report ordered by files and directories using the most backup space.

-n toppercent percentage of paths to check, default is 50 for the top 50% by file size. Trim starts faster with lower percentages.

-p sorts by pathname instead of backup space

-s skipfilesize ignores files smaller than or with less unique data than skipfilesize. -s10g skips files < 10 GB, -s1 is the same as -s1m and skips all files < 1 MB. The default is to skip files below the average file size (or 1 MB, whichever is larger). Trim starts faster when -s is larger.

Trim Reporting

Trim first scans the entire backup database to find pathnames that have the most unshared data. This is tricky in a deduplicated backup, because different pathnames may reference common data. Deleting any one pathname will not reduce backup space because that space is still used by the other pathnames.

Trim shows versions, files, and directories using the most unshared backup space. This is a hard problem even for trim! Instead of showing filesystem file sizes seen in a Unix ls list, trim shows how much backup space files are using after deduplication and compression.

Trim only reports the unshared space used by individual pathnames. If space is shared between different pathnames the shared space is not counted because freeing it would require deleting multiple files.

The directory section shows how much unshared backup space each directory is using based on the sum of unshared space used by large files in the directory. Small files are not counted, so the actual space used by the directory is larger than what trim shows.

Interactive Trim

Sites or individual users with more control over their backup data can use the -i option to start an interactive trim session. This calculates space usage as described in the Trim Reporting section, then allows entering single-keystroke commands to manipulate the largest files in the backup.

With interactive trim, the list of files and directories is easily navigated and files and directories can be marked with several states:

  • k = keep

  • d = delete from backup

  • D = delete from backup and the live filesystem

  • x = delete from backup, add to inex.conf so file isn’t saved again

  • X = delete from backup and the live filesystem, then add to inex.conf

Files and directories marked "keep" have this recorded in the backup database so it is remembered the next time interactive trim is used. This can be overridden of course.

To make trimming faster, an autoskip feature can be enabled and disabled with the a keystroke. This will show and skip files already marked as keep or delete.

After finishing the interactive trim session with the f key, trim will ask for confirmation to delete items, then ask again before committing deletes to the backup database. After the second confirmation, deletes are permanent.

Examples

First a test directory is created with a few files.

# file1 is a 10MB file of random data (use bs=10M, capital M, on Linux)
$ dd if=/dev/random of=test/file1 bs=10m count=1
1+0 records in
1+0 records out
10485760 bytes transferred in 0.320128 secs (32754902 bytes/sec)
# file2 is a 5MB file of random data
$ dd if=/dev/random of=test/file2 bs=5m count=1
1+0 records in
1+0 records out
5242880 bytes transferred in 0.157587 secs (33269789 bytes/sec)
# file12 is file1 + file2
$ cat test/file1 test/file2 >test/file12
# file1copy is a copy of file1
$ cp test/file1 test/file1copy

Let’s see the test directory:

$ ls -l test
total 81920
-rw-r--r--  1 jim  staff  10485760 Apr  2 17:20 file1
-rw-r--r--  1 jim  staff  15728640 Apr  2 17:21 file12
-rw-r--r--  1 jim  staff  10485760 Apr  2 17:21 file1copy
-rw-r--r--  1 jim  staff   5242880 Apr  2 17:20 file2

Now create a backup of the test directory:

$ hb init -c hb
HashBackup #2876 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Permissions set for owner access only
Created key file /hb/key.conf
Key file set to read-only
Setting include/exclude defaults: /hb/inex.conf

VERY IMPORTANT: your backup is encrypted and can only be accessed with
the encryption key, stored in the file:

    /hb/key.conf

You MUST make copies of this file and store them in secure locations,
separate from your computer and backup data.  If your hard drive fails,
you will need this key to restore your files.  If you have setup remote
destinations in dest.conf, that file should be copied too.

Backup directory initialized

$ hb backup -c hb test
HashBackup #2876 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Backup start: 2022-04-02 17:21:43
Copied HB program to /hb/hb#2876
This is backup version: 0
Dedup enabled, 0% of current size, 0% of max size
/
/hb
/hb/inex.conf
/test
/test/file1
/test/file12
/test/file1copy
/test/file2

Time: 0.9s
CPU:  0.5s, 49%
Mem:  78 MB
Checked: 8 paths, 41943562 bytes, 41 MB
Saved: 8 paths, 41943562 bytes, 41 MB
Excluded: 0
Dupbytes: 25990241, 25 MB, 61%
Compression: 61%, 2.6:1
Efficiency: 53.33 MB reduced/cpusec
Space: +15 MB, 16 MB total
New files using the most space:
     10 MB /test/file1
    5.3 MB /test/file12
No errors

This shows that even though we have 40MB of data in the test directory, only 15MB is using backup space. Here’s how trim shows this in a report:

$ hb trim -c hb
HashBackup #2882 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Most recent backup version: 0
Dedup loaded, 0% of current size
Skipping files below average or 1MB

Versions:
 15 MB      0

Scanning files I
5 files, average file size 8.3 MB, total 41 MB
3 files above 8.3 MB total 36 MB
Scanning files II
0 files total 0 bytes unique bytes
$

Trim has figured out that these files are all sharing data, so removing any one of them will not reduce backup space.

The next example is a backup of /Applications on Mac OSX.

$ hb backup -c hb /Applications -v1
HashBackup #2882 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Backup start: 2022-04-03 21:14:06
Copied HB program to /hb/hb#2882
This is backup version: 0
Dedup enabled, 0% of current size, 0% of max size
Backing up: /Applications
Backing up: /hb/inex.conf00 checked  1.7 GB 1325/sec

Time: 51.7s
CPU:  59.0s, 114%
Mem:  107 MB
Checked: 68014 paths, 1939557240 bytes, 1.9 GB
Saved: 68014 paths, 1939555008 bytes, 1.9 GB
Excluded: 0
Dupbytes: 197630484, 197 MB, 10%
Compression: 53%, 2.2:1
Efficiency: 16.77 MB reduced/cpusec
Space: +901 MB, 901 MB total
New files using the most space:
    101 MB /Applications/Firefox.app/Contents/MacOS/XUL
     40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma
     24 MB /Applications/iTunes.app/Contents/Resources/Assets.car
     18 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/Resources/presources.dat
     14 MB /Applications/iTunes.app/Contents/MacOS/iTunes
     12 MB /Applications/Firefox.app/Contents/Resources/browser/omni.ja
     11 MB /Applications/PDFScanner.app/Contents/Frameworks/libopencv_imgproc.dylib
    9.7 MB /Applications/Firefox.app/Contents/Resources/omni.ja
    9.2 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/MacOS/Pianoteq 7
    9.1 MB /Applications/PDFScanner.app/Contents/Resources/tesseract/fin.traineddata
No errors
$

The trim report is:

HashBackup #2882 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Most recent backup version: 0
Dedup loaded, 13% of current size
Skipping files below average or 1MB

Versions:
900 MB      0

Scanning files I
60773 files, average file size 31 KB, total 1.9 GB
243 files above 1.0 MB total 1.2 GB
Scanning files II
71 files total 260 MB unique bytes

Files

 43 MB /Applications/Firefox.app/Contents/MacOS/XUL
 40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma
 21 MB /Applications/iTunes.app/Contents/Resources/Assets.car
 15 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/Resources/presources.dat
6.4 MB /Applications/iTunes.app/Contents/MacOS/iTunes
6.1 MB /Applications/Firefox.app/Contents/Resources/browser/omni.ja
5.2 MB /Applications/Safari.app/Contents/Resources/Background Images/Safari-Background_Emoji_Safari-Animals.699.png
4.8 MB /Applications/Firefox.app/Contents/Resources/omni.ja
4.5 MB /Applications/PDFScanner.app/Contents/Frameworks/libopencv_imgproc.dylib
4.3 MB /Applications/Safari.app/Contents/Resources/Background Images/Safari-Background_Emoji_Vacation.609.png
4.1 MB /Applications/PDFScanner.app/Contents/Resources/tesseract/fin.traineddata
3.9 MB /Applications/PDFScanner.app/Contents/Frameworks/libopencv_core.dylib
...
1.0 MB /Applications/Books.app/Contents/PlugIns/iBAReaderKit.bundle/Contents/MacOS/iBAReaderKit
1.0 MB /Applications/Safari.app/Contents/Resources/Background Images/Safari-Background_California-Dogface-Butterfly.661.png

Directories

260 MB /Applications
 56 MB /Applications/Firefox.app
 56 MB /Applications/Firefox.app/Contents
 45 MB /Applications/Firefox.app/Contents/MacOS
 40 MB /Applications/Install Pianoteq 7.app
 40 MB /Applications/Install Pianoteq 7.app/Contents
 40 MB /Applications/Install Pianoteq 7.app/Contents/Resources
...

This report (usually much longer!) can be analyzed offline or given to storage users to figure out which files can be easily removed to save backup space. The -p option may be useful for reporting to group files into the normal directory structure alphabetically.

Here’s a short example of interactive trimming. The keystroke has been added in parenthesis to help understand what’s happening.

$ hb trim -c /hb -i
HashBackup #2883 Copyright 2009-2022 HashBackup, LLC
Backup directory: /hb
Most recent backup version: 0
Dedup loaded, 13% of current size
Skipping files below average or 1MB

Versions:
899 MB      0

Scanning files I
60773 files, average file size 31 KB, total 1.9 GB
243 files above 1.0 MB total 1.2 GB
Scanning files II
116 files total 498 MB unique bytes

----------------------------------------------------------------
Navigation
  n = no change, next (Enter)   b = no change, back (Backspace)
  < = up a directory            > = down a directory
  123 Enter = goto line 123     a = autoskip toggle
  Space = next or prev 25
Changes
  k = keep file                 K = keep everything in directory
  d = delete backup             D = delete backup & live file
  x = delete backup, exclude    X = delete backup & live, exclude
  - = remove keep or delete     / = toggle directory slash
Other
  l = list directory            s = show deletes
  h = help                      f = finished

q, ctrl-c, ctrl-d = quit without doing any deletes.
IMPORTANT: nothing is changed without confirmation.
----------------------------------------------------------------

Files

   1. 101 MB /Applications/Firefox.app/Contents/MacOS/XUL  (n) next
   2.  40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma  (n) next
   3.  25 MB /Applications/iTunes.app/Contents/Resources/Assets.car  (b) back
   2.  40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma  (<) up

Directories

CAREFUL! Deleting live directories will delete all
         contents, including files not in the backup.

 135.  40 MB /Applications/Install Pianoteq 7.app/Contents/Resources  (<) up
 134.  40 MB /Applications/Install Pianoteq 7.app/Contents  (<) up
 133.  40 MB /Applications/Install Pianoteq 7.app  (d) delete backup

Files

   2.  40 MB /Applications/Install Pianoteq 7.app/Contents/Resources/Install Pianoteq 7.pkg.lzma  parent Install Pianoteq 7.app deleted  (n) next
   3.  25 MB /Applications/iTunes.app/Contents/Resources/Assets.car  (n) next
   4.  18 MB /Applications/Pianoteq 7/Pianoteq 7.app/Contents/Resources/presources.dat  (f) finish

Directories

 117. 498 MB /Applications  (n) next
 118. 126 MB /Applications/Firefox.app  (n) next
 119. 126 MB /Applications/Firefox.app/Contents  (n) next
 120. 104 MB /Applications/Firefox.app/Contents/MacOS  (n) next
 121.  71 MB /Applications/Kofax Power PDF for Mac.app  (d) delete backup
 122.  71 MB /Applications/Kofax Power PDF for Mac.app/Contents  parent Kofax Power PDF for Mac.app deleted  (f) finish

Files to be deleted:
 121.  71 MB /Applications/Kofax Power PDF for Mac.app  delete backup
 133.  40 MB /Applications/Install Pianoteq 7.app  delete backup
   -> 111 MB backup space to delete

Delete these? y
Remove backup: /Applications/Kofax Power PDF for Mac.app
Remove backup: /Applications/Install Pianoteq 7.app

Commit deletes? y

Packing archives
Packing arc.0.2 into arc.0.9
Packing arc.0.3 into arc.0.9
Packing arc.0.4 into arc.0.9
Mem: 75 MB
Removed: 289 MB, 1668 files, 2 arc files
Space: -167 MB, 748 MB total
$

The backup space removed is higher than reported by trim because trim only considers large files for interactive trimming. When a directory is removed, all files are removed.

HashBackup manages backup space by periodically packing arc files to remove empty space. Depending on how the pack- config keywords are setup, backup space may not be removed immediately following a trim, but will be removed the next time arc files are packed.