reversing order of diffs.
Ben Escoto
bescoto@stanford.edu
Thu, 14 Mar 2002 02:16:00 -0800
--==_Exmh_-1456432608P
Content-Type: text/plain; charset=us-ascii
>>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
>>>>> wrote the following on Thu, 14 Mar 2002 14:37:00 +1100
DB> As I understand it, rdiff-backup currently uses a full copy of
DB> the most recent backup, with diffs for older backups. It is also
DB> capable of efficiently updating a remote backup by sending only
DB> delta's over the wire.
DB> This makes it nice and easy to restore the latest backup, and a
DB> bit slower to restore older backups. This "full-latest +
DB> old-deltas" architecture at first glance looks like rdiff would
DB> be less efficient than xdelta, which can calculate optimal
DB> delta's better than rsync's block aligned match algo. Also,
DB> xdelta2 would give you all the "get-a-particular-version" and
DB> ACID for free.
DB> However, xdelta alone can't do efficient over-the-wire
DB> transfers, because it requires access to full copies of both
DB> versions to calculate the delta. but... as I understand it,
DB> rdiff-backups efficient over-the-wire transfers must involve
DB> calculating forward-delta's to transmit over the wire,
DB> generating the latest version for the archive, then calculating
DB> backwards deltas to record older versions in the archive. This
DB> looks to me like you could still benefit from using xdelta as
DB> the archive store, and use rdiff for the efficient over-the-wire
DB> transfers.
Yes, I think this is all correct. There are a few things to be said
for sticking with the current system though:
1. It IS the current system - unbeatable ease of implementation :-)
2. xdelta would be yet another requirement
3. xdelta, last time I looked at it, was less stable than rdiff
4. xdelta uses much more memory, and I think is slower, than rdiff
5. Where is xdelta development going? It seems to be getting really
complicated and/or being submerged into some larger project.
6. Benefits from xdelta are so far theoretical. In some cases xdelta
can be much better, but it isn't clear these cases occur in real
life enough to justify the change.
DB> But... I question the whole full-latest+old-deltas archive. My
DB> problem is that it doesn't allow you to make backups that you
DB> can store offline. You cannot make a full backup, store it
DB> offline, then make small incremental backups that you also keep
DB> offline. I know that people are going to say "that is not what
DB> rdiff-backup is for", but I think it is pretty close and a small
DB> change or two could add this.
DB> All you need is to (optionly) reverse things so you have a
DB> full-oldest+new-deltas archive. For each backup you keep a full
DB> list of file signatures online. The beauty of keeping this
DB> signature list online is you can calculate new diffs against any
DB> backup, without having the full backup online.
DB> The storing a signature list online saves calculating it for
DB> remote updates. Keeping latest deltas saves the forward+reverse
DB> delta calculation needed when doing efficient over-the-wire
DB> transfers, as you just keep the transfered delta. This brings
DB> the whole thing more inline with traditional full+incremental
DB> backup tools, with the added benefit that _any_ previous backup,
DB> full or incremental, can be used as a basis for an incremental
DB> backup. Note that using offline backups with only online
DB> signatures means you can't use xdelta as the store.
DB> I'm going to look at rsync-backup code in more detail to
DB> implement something like this soon, as I _need_ offline
DB> backups. I actualy have a significant amount of Python code
DB> already written towards this end, including things like
DB> rsync-style include/exclude lists with efficient directory
DB> pruning. I never quite finished it, and now that rsync-backup is
DB> here, I'm more interested in "molding/extending" it to my needs
DB> than releasing Yet Another Backup Tool.
That is an interesting suggestion, and I can see how it solve some
people's problems, but "that is not what rdiff-backup is for" does
come to mind. I would have thought Another Backup Tool would already
do this. I suppose what's missing from other tools is the diff'ing
ability? Why is that a requirement? If diffs weren't required, I'd
assume you could use just about any backup program?
I think some of rdiff-backup's code could be useful to you in this
project. The basic idea behind rdiff-backup is to make a big
SIGNATURE of the whole mirror directory, and use that on the source
directory to make a big DIFF, and then bring that back to patch the
mirror directory. So really in the main part of rdiff-backup only two
(big) files get sent, SIGNATURE from the mirror directory, and DIFF
from the source. rdiff-backup processes these "files" in a lazy way
so that they all don't have to be generated or loaded into memory at
once.
But all the code is pretty much there if you just wanted to, say,
write a utility that extended rdiff to directories instead of just
regular files, and also saved permissions, ownership, etc. Call this
rdiff+. Then each of your incremental backups could just be an rdiff+
delta, and the stored signature information for each backup could be
an rdiff+ signature.
--
Ben Escoto
--==_Exmh_-1456432608P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001
iD8DBQE8kHhY+owuOvknOnURAh4fAJ4loXS8s5kasd4mtXmg56nwDsd4gACfTlID
Kb5xK3kTpP1QqXbKkfQCY78=
=n2tW
-----END PGP SIGNATURE-----
--==_Exmh_-1456432608P--