xdelta vs. rdiff

Dan Sturtevant dsturtev@plogic.com
Mon, 11 Feb 2002 14:40:28 -0500 (EST)


Ok, for more realistic numbers.  My patch file between the different
distributions Is now










On Mon, 11 Feb 2002, Dan Sturtevant wrote:

>
> Ben, I used xdelta to create a diff of the distributions.
> There were approx 50 rpms that were different between the two distros.
>
> distro 1: 610 Megs
> distro 2: 623 Megs
>
> Delta file generated by rdiff on the two tarballs was ~650 Megs.
>
> Delta directory generated by rdiff-backup ~150 Megs.
>
> Delta directory created by rsync+ ~150 Megs.  (although this system is
> still beta and very broken.)
>
> Al the above systems were based on the rdiff algorithm.  The reason the
> rdiff-backup and rsync+ got down to 150 Megs is because they traverse
> directories and make deltas against individual files.  The compression
> within each file is still based upon the inefficient rdiff algorithm.
>
>
> Here is the impressive part.
>
> running:
> xdelta delta tar1.tar tar2.tar tar.patch
> produced a patch file of 87K
>
> I couldn't believe it.
>
> I moved tar1.tar and tar.patch to a different directory and ran:
> xdelta patch tar.patch tar1.tar tar2-2.tar
>
> I then ran
> diff tar2.tar tar2-2.tar.
>
> No difference.
>
> xdelta is very computationally intensive.  I dont have any hard numbers
> thus far, but My system running a 2.4.3-12 redhat kernel with 512 Megs of
> memory was swapping.
>
> Needless to say, I reccomend looking into using this system in any case
> where binary data represents the majority of the data you are trying to
> compress.
>
> Thanks,
> Dan
>
>
> On Thu, 7 Feb 2002, Ben Escoto wrote:
>
> > >>>>> "DS" == Dan Sturtevant <dsturtev@plogic.com>
> > >>>>> wrote the following on Thu, 7 Feb 2002 14:05:45 -0500 (EST)
> >
> >   DS> 4. This was a nogo.  rdiff's output (from the diff of the 2
> >   DS> tarballs) was ~650Megs.  Each of the distro's was approximately
> >   DS> the same size.  I assume that this was because of file offsets
> >   DS> within the tar file and because lots of binary info was present.
> >   DS> The algorithm just didnt work for this case.
> >
> > This is a bit disappointing...  It seems rdiff isn't as good at
> > finding binary similarities as I thought.  Just for my curiousity
> > though, if you still have the tarballs around, could you try the same
> > thing with xdelta v1.x.x?  You can find RPMs of it with rpmfind.  I'm
> > wondering if it is superior to rdiff for this kind of thing.
>
> _______________________________________________
> Rdiff-backup mailing list
> Rdiff-backup@keywest.Stanford.EDU
> http://keywest.Stanford.EDU/mailman/listinfo/rdiff-backup
>