Hard links

Nick Duffek nick@duffek.com
Tue, 12 Mar 2002 21:31:21 -0500 (EST)


On 12-Mar-2002, Ben Escoto wrote:

>  ND> If there were long-term plans for rdiff-backup to do that for
>  ND> all files, then the hard-link space savings question could be
>  ND> ignored, since eventually it would become irrelevant.

>Hmm, I'm not sure I understand..  Could you explain what you mean?

Many files in rdiff-backup-data might be identical, even though their
counterparts on the source system are not hard-linked.

(For example, user accounts added with the useradd command all start out
with the same set of files from /etc/skel.  When such accounts are
deleted, many of those files might not have changed since they were copied
from /etc/skel, so there could be many identical .missing files -- if I've
got the extension right -- in rdiff-backup-data.)

Suppose rdiff-backup saved space by hard-linking files in
rdiff-backup-data and maintaining a database of those links.  In that
case, maybe it could do something similar with identical non-hard-linked
files: save only one copy of each group of identical files, and store
pointers to the other copies in a database.

And if it could do that, then maybe it could also maintain a database of
identical file sub-parts, saving even more space.

I'm not suggesting that you do any of this; I'd be surprised if the space
savings would be worth the extra complexity and running time.

My point was that if you planned to save space by using hard links in
rdiff-backup-data, then you'd be just one step along a continuum of
compression.  Each step on that continuum is a superset of its
predecessors, so if you opted for a later one, then you wouldn't need to
implement the earlier ones.

Nick