rdiff-backup across the network

Ben Escoto bescoto@stanford.edu
Tue, 12 Mar 2002 11:48:01 -0800


--==_Exmh_-467849576P
Content-Type: text/plain; charset=us-ascii

>>>>> "ST" == Stephen Tan <Tan>
>>>>> wrote the following on Tue, 12 Mar 2002 13:59:47 -0000

  ST> Hi, I'm considering using rdiff-backup here at work and I'm
  ST> generally vry impressed with the concept and ease of use of this
  ST> script!

Thanks!

  ST> It has a very low cpu and memory overhead which is welcome.

Well, I find that it uses lots of cpu, but I'm glad for your
experience.

  ST> There is one thing I was wondering about though, and that is the
  ST> speed across the network of rdiff-backup. I am running across a
  ST> 100mbit switched LAN, and using rsync, I can acheive a transfer
  ST> rate of about 1.5 mb/s (albeit with the CPU consumption going
  ST> very high.)

  ST> I get about about 4-5 mb/min (for actual data transfer speed)
  ST> using rdiff-backup.

Well that sounds good, except for the "min" part.

  ST> You did mention that rdiff-backup was slower than rsync, but I
  ST> did anticipate such a large factor.

  ST> Is this because:

  ST> (i) increasing throughput would load the cpu more? (ii) ssh is a
  ST> bottleneck? (I think this is unlikely!)  (iii) the rdiff
  ST> algorithm is set for smaller bandwidths?

  ST> I'd love to be able to increase the throughput by a factor of
  ST> 2-3 if possible - I have some cpu on both ends to spare and lots
  ST> of bandwidth. Is this possible?

I'm surprised rdiff-backup is doing so poorly.  If cpu and bandwidth
aren't bottlenecks, what is?  There is nothing in rdiff-backup (except
at one small point which I don't think would be an issue) which tells
it to take it easy, so it should just run as fast as it can.

    Are you running rsync over ssh?  I agree that it is unlikely that
ssh is the problem, but comparing rsync w/ ssh to rdiff-backup would
at least remove that variable.

    If I can bother you to do some of my work for me, it would help if
you ran a few tests to try to narrow the problem now.  There are three
natural possibilities for what rdiff-backup is doing too slowly:

1.  Just transferring files.  So you could just transfer one file for
    an initial mirroring and see if that is much slower than rsync.

2.  Comparing lots of files that are the same to see if they are the
    same.  I'm sure rdiff-backup has more overhead than rsync (or at
    least it should, considering how much overhead there is, and rsync
    has that neato superpipelining stuff), so this may be part of the
    problem.

3.  Updating changed files when most of the file is the same.  So
    maybe rsync is using a better diffing algorithm than rdiff.

I ran some benchmarks of an earlier version (0.4.x?) against rsync.  I
found that rdiff-backup uses more memory than rsync for small file
sets, but uses A LOT less memory for large file sets (rsync wants to
load the whole filelist into memory at once).  In the local case,
rdiff-backup was 25% faster maybe.  For the remote case, I think rsync
is about twice as fast (?) for lots of small files or not much change
(but this isn't as bad as it sounds, because rsync is already
something like 1000 times faster than ftp for small files under some
conditions - read Tridgell's dissertation), but rdiff-backup
approaches equality the larger the files get.  But even if I'm
remembering this correctly, the current version is probably much
slower.


--
Ben Escoto

--==_Exmh_-467849576P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001

iD8DBQE8jlts+owuOvknOnURAjKjAKCD8DBenYfLVLFwu7e+rk2iIelPYACbB9Oj
xYPEClveQa9AvIbdcFVEH3c=
=nPJx
-----END PGP SIGNATURE-----

--==_Exmh_-467849576P--