xdelta vs. rdiff
Ben Escoto
bescoto@stanford.edu
Tue, 12 Feb 2002 10:27:25 -0800
--==_Exmh_-1355720518P
Content-Type: text/plain; charset=us-ascii
>>>>> "Dan" == Dan Sturtevant <dsturtev@plogic.com>
>>>>> wrote the following on Tue, 12 Feb 2002 12:12:07 -0500 (EST)
Dan> The rediculously low number I was getting before is way off.
Dan> The actual size of the delta using xdelta on the tar (or iso)
Dan> files is 115 Megs. This is still substantially lower than the
Dan> 150 Megs created by rdiffs of each individual file and has the
Dan> added advantage of working well with a large single file with
Dan> lots of binary data. The algorithm works better better under
Dan> my set of constraints. The limiting factor others might run
Dan> into is that it requires a large amount of memory. Ideally,
Dan> you would have enough memory to have both files in memory at
Dan> the same time during delta generation. In my case This size is
Dan> 1.3 Gigs. However, if the xdelta algorithm was used instead of
Dan> rdiff in a system like rdiff backup, the size of each
Dan> individual file in memory would be much smaller than my 650 Meg
Dan> file. Backups would take longer, but binary data size would be
Dan> substantially smaller.
Interesting, thanks for the benchmark data. I was told by MacDonald
(xdelta author) that xdelta would work well comparing two tarballs,
and I guess he was right. It can't replace rdiff in a system like
rdiff-backup, though.
Firstly, it would be silly for a project called "rdiff-backup" to
have no relation to rdiff. Secondly, rdiff has the ability to generate
diffs without having access to both files simultaneously. So if there
is file A' on the source computer and older file A on the mirror,
rdiff-backup makes a signature of A (pretty small), sends that over,
the other computer makes a diff using only the signature of A and the
new file A', and that diff gets sent back to the mirror. So this
saves a lot of network traffic compared to actually copying one of the
files to the other side (in theory).
But it's interesting that xdelta really is much better at finding
all the binary similarities. The rdiff people are really smart, so
the problem is probably that rdiff's architecture constraints don't
let it use all the tricks that xdelta does.
--
Ben Escoto
--==_Exmh_-1355720518P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001
iD8DBQE8aV6L+owuOvknOnURAonmAJ9PNBMy50kjeRfouY6z/aNjlTdtNwCfeN3t
z3ZWxrRZ/bOj2N8Aw/8Pp/w=
=y4Gs
-----END PGP SIGNATURE-----
--==_Exmh_-1355720518P--