Hello!

Questions / comments / suggestions / bug reports are always welcome and appreciated!

Here is a fairly typical support question and answer session about HashBackup, to give new customers an idea of the kind of support we like to provide.


From: Thomas Neuber
To: info@hashbackup.com
Subject: Question regarding a special backup option --no-inode
Date: Monday, May 13, 2024 11:14 AM

Hi,

I have been using your software - HashBackup - for a very long time. Recently, you abandoned the 32-bit Linux version, that I used on my (very) old Synology NAS. It does not support the 64-bit Linux version. Therefore, I migrated all saved data to a new hardware, which is now 64-bit capable. I want to continue with existing backup history. The inodes of saved files has been changed, permissions and probably some timestamps. Do you recommend to continue the backup with the -no-inode option? I want to backup those files that have been changed since last working backup from the old system, but I want to avoid a full backup.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Monday, May 13, 2024 12:59 PM

Hi Thomas! Could you explain a bit more what you are backing up? It’s not clear to me if you are backing up the files residing on your NAS or if you are using your NAS as storage and backing up files from other computer systems attached via network to your NAS.

Jim


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Monday, May 13, 2024 1:33 PM

Hi Jim! I am backing up files residing on my NAS. I think, the biggest part of it is my photo archive containing approximately 3 TB of photos and videos, but also other documents, Maildir etc. The backup does not contain backup images from other attached computer systems.

Because of the large size and number of files, I want to avoid a new full backup. I have a local copy of that backup and one copy at Backblaze. It would last around 18 month and consumes a lot of additional space, I think.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Monday, May 13, 2024 2:36 PM

Hi Thomas - since you copied data from your old NAS to your new one, some file attributes are probably changed, as you mentioned. The --no-ino option doesn’t help you in this case: it is designed to handle the situation where, on some networked file systems, the inode number changes randomly. This happens because the file server doesn’t have fixed inode numbers and assigns them as files are opened and closed. The --no-ino option ignores these changing inode numbers, but HB will still look at all the other file attributes, and if any of them have changed, it will try to backup the file.

If you have always backed up with dedup enabled, then even though HB will read every file, it will not create much new data in the way of arc files, because most of it will dedup. If you’re not sure you’ve always had dedup enabled on every backup (it used to be disabled by default), you could send me an export of your backup or the output of hb stats and I can probably tell.

Questions:

  1. Are the pathnames on your new NAS the same or different from your old NAS?

  2. How much RAM is used for dedup, either with the dedup-mem config setting or -D on the backup command?

  3. How much RAM do you have on your new NAS, and how much is available (free)?

  4. How big is hash.db?

Let me know and I can give you some advice.

Jim


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Monday, May 13, 2024 3:30 PM

Hi Jim!

#1. Almost all files have the same pathnames.

#2. The dedup memory was configured to 200m. It was set with the config command. I did not use the -D option.

#3. The old system reports with "cat /proc/meminfo"

MemTotal:         716092 kB
MemFree:          115764 kB
Buffers:           76040 kB
Cached:           117660 kB
SwapCached:        54036 kB
Active:           158996 kB
Inactive:         157736 kB
Active(anon):      70500 kB
Inactive(anon):    71784 kB
Active(file):      88496 kB
Inactive(file):    85952 kB
Unevictable:        1436 kB
Mlocked:            1436 kB
HighTotal:        296968 kB
HighFree:          11636 kB
LowTotal:         419124 kB
LowFree:          104128 kB
SwapTotal:       2527156 kB
SwapFree:        2174156 kB
Dirty:               524 kB
Writeback:             0 kB
AnonPages:         86632 kB
Mapped:            46000 kB
Shmem:             17820 kB
Slab:              65316 kB
SReclaimable:      31472 kB
SUnreclaim:        33844 kB
KernelStack:        3656 kB
PageTables:         7300 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2885200 kB
Committed_AS:    1718364 kB
VmallocTotal:     600064 kB
VmallocUsed:      206704 kB
VmallocChunk:     321220 kB
DirectMap4k:       14328 kB
DirectMap4M:      417792 kB

The new system reports with "cat /proc/meminfo"

MemTotal:        3983740 kB
MemFree:         1227932 kB
MemAvailable:    2443620 kB
Buffers:           17892 kB
Cached:          1257892 kB
SwapCached:        30072 kB
Active:          1047352 kB
Inactive:         874896 kB
Active(anon):     338372 kB
Inactive(anon):   337988 kB
Active(file):     708980 kB
Inactive(file):   536908 kB
Unevictable:        1516 kB
Mlocked:            1516 kB
SwapTotal:       4489140 kB
SwapFree:        4240112 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        630384 kB
Mapped:            82712 kB
Shmem:             28384 kB
Slab:             604752 kB
SReclaimable:     533292 kB
SUnreclaim:        71460 kB
KernelStack:        9824 kB
PageTables:        14408 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6481008 kB
Committed_AS:    3130268 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
DirectMap4k:       17940 kB
DirectMap2M:     3065856 kB
DirectMap1G:     1048576 kB

#4. The hash.db consumes 201M on disk.

Are you interested in some statistics?

HashBackup #3091 Copyright 2009-2024 HashBackup, LLC
Backup directory: /volume1/@hashbackup

               2,162 backups
               13 PB file bytes checked since initial backup
               18 TB file bytes saved since initial backup
                2941 total backup hours
              7.1 TB average file bytes checked per backup in last 5 backups
              1.9 GB average file bytes saved per backup in last 5 backups
               0.03% average changed data percentage per backup in last 5 backups
              38m 3s average backup time for last 5 backups
              87,338 archives
              7.9 TB archive space
              97.64% archive space utilization 7.7 TB
               682:1 industry standard dedup ratio
              1.0 GB average archive space per backup for last 5 backups
                 1:1 reduction ratio of backed up files for last 5 backups
              209 MB dedup table current size
          17,160,013 dedup table entries
                 98% dedup table utilization at current size
           1,507,794 files
             685,698 paths
               2,469 extended metadata entries
           500 bytes largest extended metadata entry
              4.0 GB database size
              157 MB space used by files (3%)
               77 MB space used by paths (1%)
              3.3 GB space used by blocks (80%)
              196 MB space used by refs (4%)
              516 KB space used by extended metadata (0%)
               10 MB space used by other (0%)
                 91% database space utilization
          50,092,637 blocks
          75,126,673 block references
          25,034,036 deduped blocks
               1.5:1 block dedup ratio
              158 KB average stored block size
              3.9 TB backup space saved by dedup
               48 KB average variable-block length

Maybe, some more statistic?

HashBackup #3091 Copyright 2009-2024 HashBackup, LLC
Backup directory: /volume1/@hashbackup
Showing active versions

Version | Owner  |     Backup Start    |      Backup End     |        Time       | Files  |  File, Backup Space | HB #
--------+--------+---------------------+---------------------+-------------------+--------+---------------------+-----
   0    | 0 root | 2017-07-24 18:51:21 | 2017-07-25 01:07:18 |     6h 15m 57s    |   4    |              241 KB | 1926
  ...
  2161  | 0 root | 2024-04-14 11:00:51 | 2024-04-14 11:13:15 |      12m 24s      |   90   |    804 MB    207 MB | 3083

I created the backup job on daily basis with --maxtime 16h. If it takes several days, it would not be a problem. As I understood, I should not use --no-ino. The inodes have been changed once but are stable from now on. It may take same time to catch up and dedup will help.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Monday, May 13, 2024 7:00 PM

Hi Thomas!

On Mon, May 13, 2024, at 3:30 PM, Thomas Neuber wrote:

Hi Jim!

#1. Almost all files have the same pathnames.

This is good: HB can dedup better if the pathname is the same

#2. The dedup memory was configured to 200m. It was set with the config command. I did not use the -D option.

This was a good size for your old system, but according to hb stats, your dedup table is full, and not all blocks are in the dedup table. You have 50M blocks total, but only 17M are in the dedup table.

On your new system, you have a lot more RAM available: 1.2GB is completely free, and 2.4GB is available, in total, if needed. This is because 2.4 - 1.2 = 1.2 GB is being used for file system buffering, also shown as "Cached". You want to have as many blocks as possible in the dedup table, so I would increase dedup-mem to 2GB. When you do that, your next backup should say "Expanding dedup table". To test this, just backup 1 file.

Then I would backup a few of your oldest files / photos, and see how HB does at deduping them. If it seems to do well, do a directory of files. You might want to monitor your NAS during the backup and see if it is paging. If so, 2GB is too big and you might need to decrease it to 1.5GB. Once your small test backups are working well, do your whole backup.

The other thing I will mention is that you probably have block-size set to 32K since that was the default for a long time. The default for new backups is "auto", which means HB will select the block size for each file, and it will most likely be much larger. The advantage of using a larger block size is that you will have fewer blocks. So instead of having 50M blocks in hb.db, you might only have 5-10M. Also, all of the blocks will fit into a smaller dedup table.

HOWEVER: the big disadvantage of changing the block size on an existing backup is that it is like starting over, because you cannot dedup against previous versions if the block size changes. To go this route, I would probably start completely over, and that may not be something you want to do. I’m only mentioning it because it would be more efficient going forward.

If you have any questions, just ask! Let me know if this works for you.

Jim

PS: you are right, --no-ino should not be used. It doesn’t hurt much, but one of the side effects of --no-ino is that hard links are backed up as separate files instead of being saved as hard links. By definition, a hard link is when two files have the same inode number, and if inode numbers are being ignored or randomly generated, it’s not possible to determine hard link relationships.


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 10:12 AM

Hey Thomas, I was thinking today that 2GB is probably a bit too big for dedup-mem. You have ~50M blocks in your backup, and were storing 17M blocks with dedup-mem at 200M. So you need about 600M to store all 50M blocks. If you set dedup-mem to 1GB, you can store all your blocks with room for future growth, and you will have more RAM available for the OS to cache pages from the hb.db database during the backup.

Let me know how it goes!

Jim


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 10:22 AM

Hi Jim, I read your message this morning and came back from work few minutes ago and wanted to start with your suggestions. I will let you know about the results. I am pretty sure, that it will work. As I have learn in last seven or eight years, you created a pretty good piece of software.

Thomas


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 11:49 AM

Hi Jim! I changed the dedup-mem to 1g and left the block size as it was. The first backup run expanded the dedup table. I specified a really old file. It was saved beside the inex.conf. I was not aware of changes in the inex.conf. The specified file has 25 bytes and the inex.conf has 296 bytes. The backup size increased by 288 bytes. The second backup saved a complete directory containing 30 files with 27 mb. The second backup size was 240 bytes. That means, it works perfect.

You did a great job with the tool and thank you for the great support.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 1:19 PM

Hi Thomas! Thanks for letting me know it all worked out okay.

One other thing you might want to think about: when you finish the backup of your new server, it shouldn’t create much new backup data, as you have seen. But it will create a lot of new data in your hb.db database because your file attributes like ctime and mtime were changed during the copy from the old server. It will probably double your database size if every file is backed up again.

For things like photos, I’m guessing there is no history of previous versions that you want to keep (unlike with documents), so having a version 0 of a photo and now having the same file stored in version 2000 (or whatever) is not useful. What you could do is run retain after your new backup is completed, either on specific directories like photos or on the whole backup, and use the -m1 option to say "I only want 1 copy of a pathname in the backup". HB will keep the most recent copy and delete the older copies. If a file is marked as deleted in the backup (it was deleted from your NAS), HB will also keep those; only the -x option controls when they are deleted from the backup.

Example:

$ hb retain -c blah -m1 /my/photos1 /myphotos2

Without pathnames, retain will remove the version history for all files in the backup.

Jim


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 1:40 PM

Hi Jim! Thank you for your advice. I use a cycle of backup, retain and test. But keeping only the latest backup, I was not aware of. That sounds really interesting, especially for things like photos. This is something that I should really consider, keeping only one or two copies of certain files.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 2:03 PM

Hey Thomas! After thinking about it some more, I realized I goofed a bit. In your hb.db, 80% is "blocks", 4% is "refs", and 3% is "files". When you backup the new server and it dedups everything, "files" and "refs" might double, but "blocks" will not. So your hb.db should only grow about 7%.

Jim


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Tuesday, May 14, 2024 2:28 PM

Hi Jim! I started already my backup procedure. It has been running for 20 minutes. Following the log output, it looks pretty good. The CPU and memory consumption seems to be moderate.

115M dest.db
1.1G hash.db
3.8G hb.db

A 7% bigger hb.db would not hurt. My original concern was about the amount of new arc files.

I will let you know the result. My backup has maxtime of 16h. Let us see how far we get within 16 hours and/or how many days we will need to get in sync. Last backup was a month ago. A lot of new files have been added for the last month.

Thomas


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Wednesday, May 15, 2024 12:47 AM

Hi Jim! The backup is still running. The process of collecting and backing up changed files seems to be completed, but upload is in progress. The latest revision created 43g arc files. That looks reasonable for the changes during the last month without backup. The deduplication works really good.

Thank you for that great tool.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Wednesday, May 15, 2024 9:56 AM

Hi Thomas! I’m glad the backup of your new NAS went smoothly! Thanks for letting me know!

I was thinking, it might be interesting to publish our email exchange on the HB website, to show potential new HashBackup users an example of customer support. I’d remove your personal information (use just your first name) and would have to reformat things. Would you mind if I tried that? I can let you take a look first before publishing anything.

Thanks! Jim


From: Thomas Neuber
To: Jim Wilcoxson
Subject: AW: Question regarding a special backup option --no-inode
Date: Wednesday, May 15, 2024 11:25 AM

Hi Jim!

Yes, you can publish this conversation on your website. It does not contain any confidential information. Yes, please reduce my personal information like my private email address.

The backup log file says, that the backup failed. I would say: completed successfully. Yes, the upload to Backblaze was stopped after reaching the time limitation.

2024-05-14 Tue 19:59:36| $ /var/services/homes/admin/bin/hb backup -c XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX --maxtime 16h
2024-05-14 Tue 19:59:36| HashBackup #3091 Copyright 2009-2024 HashBackup, LLC
2024-05-14 Tue 19:59:37| Backup directory: /volume1/@hashbackup
2024-05-14 Tue 19:59:38| Backup start: 2024-05-14 19:59:38
2024-05-14 Tue 19:59:38| Backup stop goal: 2024-05-15 11:59:38
2024-05-14 Tue 19:59:38| Using destinations in database
2024-05-14 Tue 19:59:57| This is backup version: 2165
2024-05-14 Tue 20:00:06| Dedup enabled, 53% of current size, 53% of max size
2024-05-14 Tue 20:00:06| Updating dedup information
2024-05-14 Tue 20:00:14| /
2024-05-14 Tue 20:00:14| /volume1
2024-05-14 Tue 20:00:14| /volume1/@hashbackup
...
2024-05-15 Wed 15:04:47| Copied hb.db.5667 to backblaze (2.0 GB 3h 2m 24s 185 KB/s)
2024-05-15 Wed 15:04:47| Interrupted before all archives were copied
2024-05-15 Wed 15:04:49| Copied dest.db to jupiter (120 MB 2s 50 MB/s)
2024-05-15 Wed 15:14:55| Copied dest.db to backblaze (120 MB 10m 8s 197 KB/s)
2024-05-15 Wed 15:14:55| Removed hb.db.5666 from backblaze
2024-05-15 Wed 15:14:55| Removed hb.db.5666 from jupiter
2024-05-15 Wed 15:14:57|
2024-05-15 Wed 15:14:57| Time: 35237.3s, 9h 47m 17s
2024-05-15 Wed 15:14:57| CPU:  21317.9s, 5h 55m 17s, 60%
2024-05-15 Wed 15:14:57| Wait: 33846.2s, 9h 24m 6s
2024-05-15 Wed 15:14:57| Mem:  1.5 GB
2024-05-15 Wed 15:14:57| Checked: 523453 paths, 7238541501385 bytes, 7.2 TB
2024-05-15 Wed 15:14:57| Saved: 523368 paths, 7237868943907 bytes, 7.2 TB
2024-05-15 Wed 15:14:57| Excluded: 10129
2024-05-15 Wed 15:14:57| Dupbytes: 7191083716072, 7.1 TB, 99%
2024-05-15 Wed 15:14:57| Compression: 99%, 158.2:1
2024-05-15 Wed 15:14:57| Efficiency: 321.75 MB reduced/cpusec
2024-05-15 Wed 15:14:57| Space: +45 GB, 7.9 TB total
...
2024-05-15 Wed 15:14:57| Recent errors (log has all errors):
2024-05-15 Wed 15:14:57|   20:00:14 Unable to stat file: No such file or directory: /volume1/Calendar
2024-05-15 Wed 15:14:57|   15:04:47 Interrupted before all archives were copied
2024-05-15 Wed 15:14:57| Errors: 2
2024-05-15 Wed 15:14:57| Exit 1: Fail

The retain log

2024-05-15 Wed 15:14:58| $ /var/services/homes/admin/bin/hb retain -c /volume1/@hashbackup -s safe -x 7d
2024-05-15 Wed 15:14:58| HashBackup #3091 Copyright 2009-2024 HashBackup, LLC
2024-05-15 Wed 15:14:58| Backup directory: /volume1/@hashbackup
2024-05-15 Wed 15:14:58| Most recent backup version: 2165
2024-05-15 Wed 15:14:58| Using destinations in database
2024-05-15 Wed 15:15:08| Dedup loaded, 67% of current size
2024-05-15 Wed 15:15:10| Backup finished at: 2024-05-15 05:44:49
2024-05-15 Wed 15:15:10| Retention schedule: 7d4w3m4q5y retain-extra-versions on
2024-05-15 Wed 15:15:10| Retention period: 7y 8m 12d 13h 30m (since 2016-09-01 10:14:49)
2024-05-15 Wed 15:15:10| Remove files deleted before 2024-05-08 05:44:49
2024-05-15 Wed 15:15:10| Checking files
2024-05-15 Wed 15:17:40| Checked 1941999 files
2024-05-15 Wed 15:17:40| Checking 40779 directories
...
2024-05-15 Wed 16:13:22| Mem: 1.2 GB
2024-05-15 Wed 16:13:22| Downloaded: 44 MB, 4% of download quota 950 MB
2024-05-15 Wed 16:13:22| Removed: 789 GB, 60874 files, 0 arc files
2024-05-15 Wed 16:13:22| Space: -153 MB, 7.9 TB total
2024-05-15 Wed 16:13:22| 1941999 files, 1881125 96% kept, 60874 3% deleted
2024-05-15 Wed 16:13:22| Exit 0: Success

A test run is still in progress.

Thomas


From: Jim Wilcoxson
To: Thomas Neuber
Subject: Re: Question regarding a special backup option --no-inode
Date: Wednesday, May 15, 2024 2:21 PM

Hi Thomas! Thanks for sending the results. Your backup seems to have worked very well. It pointed out a very small buglet; where it says:

2024-05-15 Wed 15:04:47| Interrupted before all archives were copied

The message should have been:

Time limit on --maxwait before all archives were copied.

But since you used --maxtime instead of --maxwait, it gave the other error message, which should only occur on ctrl-c. Easy to fix though.

Thanks for permission to use the email. I’ll send a link to review the web page when I get it formatted.

Jim