xdelta vs. rdiff

Ben Escoto bescoto@stanford.edu
Tue, 12 Feb 2002 10:27:25 -0800


--==_Exmh_-1355720518P
Content-Type: text/plain; charset=us-ascii

>>>>> "Dan" == Dan Sturtevant <dsturtev@plogic.com>
>>>>> wrote the following on Tue, 12 Feb 2002 12:12:07 -0500 (EST)

  Dan> The rediculously low number I was getting before is way off.
  Dan> The actual size of the delta using xdelta on the tar (or iso)
  Dan> files is 115 Megs.  This is still substantially lower than the
  Dan> 150 Megs created by rdiffs of each individual file and has the
  Dan> added advantage of working well with a large single file with
  Dan> lots of binary data.  The algorithm works better better under
  Dan> my set of constraints.  The limiting factor others might run
  Dan> into is that it requires a large amount of memory.  Ideally,
  Dan> you would have enough memory to have both files in memory at
  Dan> the same time during delta generation.  In my case This size is
  Dan> 1.3 Gigs.  However, if the xdelta algorithm was used instead of
  Dan> rdiff in a system like rdiff backup, the size of each
  Dan> individual file in memory would be much smaller than my 650 Meg
  Dan> file.  Backups would take longer, but binary data size would be
  Dan> substantially smaller.

Interesting, thanks for the benchmark data.  I was told by MacDonald
(xdelta author) that xdelta would work well comparing two tarballs,
and I guess he was right.  It can't replace rdiff in a system like
rdiff-backup, though.

    Firstly, it would be silly for a project called "rdiff-backup" to
have no relation to rdiff.  Secondly, rdiff has the ability to generate
diffs without having access to both files simultaneously.  So if there
is file A' on the source computer and older file A on the mirror,
rdiff-backup makes a signature of A (pretty small), sends that over,
the other computer makes a diff using only the signature of A and the
new file A', and that diff gets sent back to the mirror.  So this
saves a lot of network traffic compared to actually copying one of the
files to the other side (in theory).

    But it's interesting that xdelta really is much better at finding
all the binary similarities.  The rdiff people are really smart, so
the problem is probably that rdiff's architecture constraints don't
let it use all the tricks that xdelta does.


--
Ben Escoto

--==_Exmh_-1355720518P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001

iD8DBQE8aV6L+owuOvknOnURAonmAJ9PNBMy50kjeRfouY6z/aNjlTdtNwCfeN3t
z3ZWxrRZ/bOj2N8Aw/8Pp/w=
=y4Gs
-----END PGP SIGNATURE-----

--==_Exmh_-1355720518P--