rdiff deltas not very good compared to pysync, why?

Donovan Baarda abo@minkirri.apana.org.au
Thu, 18 Apr 2002 22:11:26 +1000


G'day,

Just been doing some work on librsync for a Python extension, and noticed
that it is producing deltas more than twice as big as pysync produces, using
the same block size. I'm using the released v0.9.5 code.

I'm using some test files I generated for pysync testing. These consist of a
256K random data "oldfile.bin", and a slightly larger "newfile.bin" that
includes random edits (insert,replace,delete,copy) of "oldfile.bin". Because
this is all random data, it doesn't compress.

pyproxy can produce both rsync and xdelta style deltas. The xdelta results
should be pretty close to optimal, so they make a good basis to compare
against.

The default block size for pyproxy is 1024, so I used "-b 1024" when running
rdiff to force the same block size. The results I got were;

Operation     	       size
----------------------------------
source oldfile.bin     262144
target newfile.bin     325316
rdiff signature	         3084
pyproxy sig	         8090
pyproxy xdelta	       103463
pyproxy rdelta	       131389
rdiff delta	       319252

As you can see, the "rdiff signature" was less than half the size of than
"pysync sig". This is understandable, as pysync uses a Python pickled dict
of dicts for it's sigfile format.

However, the "rdiff delta" is more than two times the size of "pysync
rdiff", and more than three times the optimal "pyproxy xdelta". Since pysync
uses a pickled Python list of (offset,length) tupples and insert strings, I
find this very surprising. None of these are faulty, as a "patch" by any of
the tools uses the correct result.

Note that pysync does use gzip context compression (compressing the whole
data stream, including hits, but only including the compressed output of
misses), and I don't thing rdiff does. However, in this case the input data
was all random so compression has no effect. Compressing any of the inputs
or outputs yeilds negligable change.

I haven't examined the librsync code to figure out why yet, but I suspect
that there might be a bug in the rolling checksums. There is certainly
something wrong.

-- 
----------------------------------------------------------------------
ABO: finger abo@minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------