xdelta vs. rdiff

Dan Sturtevant dsturtev@plogic.com
Mon, 11 Feb 2002 11:52:32 -0500 (EST)


Ben, I used xdelta to create a diff of the distributions.
There were approx 50 rpms that were different between the two distros.

distro 1: 610 Megs
distro 2: 623 Megs

Delta file generated by rdiff on the two tarballs was ~650 Megs.

Delta directory generated by rdiff-backup ~150 Megs.

Delta directory created by rsync+ ~150 Megs.  (although this system is
still beta and very broken.)

Al the above systems were based on the rdiff algorithm.  The reason the
rdiff-backup and rsync+ got down to 150 Megs is because they traverse
directories and make deltas against individual files.  The compression
within each file is still based upon the inefficient rdiff algorithm.


Here is the impressive part.

running:
xdelta delta tar1.tar tar2.tar tar.patch
produced a patch file of 87K

I couldn't believe it.

I moved tar1.tar and tar.patch to a different directory and ran:
xdelta patch tar.patch tar1.tar tar2-2.tar

I then ran
diff tar2.tar tar2-2.tar.

No difference.

xdelta is very computationally intensive.  I dont have any hard numbers
thus far, but My system running a 2.4.3-12 redhat kernel with 512 Megs of
memory was swapping.

Needless to say, I reccomend looking into using this system in any case
where binary data represents the majority of the data you are trying to
compress.

Thanks,
Dan


On Thu, 7 Feb 2002, Ben Escoto wrote:

> >>>>> "DS" == Dan Sturtevant <dsturtev@plogic.com>
> >>>>> wrote the following on Thu, 7 Feb 2002 14:05:45 -0500 (EST)
>
>   DS> 4. This was a nogo.  rdiff's output (from the diff of the 2
>   DS> tarballs) was ~650Megs.  Each of the distro's was approximately
>   DS> the same size.  I assume that this was because of file offsets
>   DS> within the tar file and because lots of binary info was present.
>   DS> The algorithm just didnt work for this case.
>
> This is a bit disappointing...  It seems rdiff isn't as good at
> finding binary similarities as I thought.  Just for my curiousity
> though, if you still have the tarballs around, could you try the same
> thing with xdelta v1.x.x?  You can find RPMs of it with rpmfind.  I'm
> wondering if it is superior to rdiff for this kind of thing.