Hard links
Nick Duffek
nick@duffek.com
Tue, 12 Mar 2002 21:31:21 -0500 (EST)
On 12-Mar-2002, Ben Escoto wrote:
> ND> If there were long-term plans for rdiff-backup to do that for
> ND> all files, then the hard-link space savings question could be
> ND> ignored, since eventually it would become irrelevant.
>Hmm, I'm not sure I understand.. Could you explain what you mean?
Many files in rdiff-backup-data might be identical, even though their
counterparts on the source system are not hard-linked.
(For example, user accounts added with the useradd command all start out
with the same set of files from /etc/skel. When such accounts are
deleted, many of those files might not have changed since they were copied
from /etc/skel, so there could be many identical .missing files -- if I've
got the extension right -- in rdiff-backup-data.)
Suppose rdiff-backup saved space by hard-linking files in
rdiff-backup-data and maintaining a database of those links. In that
case, maybe it could do something similar with identical non-hard-linked
files: save only one copy of each group of identical files, and store
pointers to the other copies in a database.
And if it could do that, then maybe it could also maintain a database of
identical file sub-parts, saving even more space.
I'm not suggesting that you do any of this; I'd be surprised if the space
savings would be worth the extra complexity and running time.
My point was that if you planned to save space by using hard links in
rdiff-backup-data, then you'd be just one step along a continuum of
compression. Each step on that continuum is a superset of its
predecessors, so if you opted for a later one, then you wouldn't need to
implement the earlier ones.
Nick