rdiff-backup across the network

Tan, Stephen Tans@soe.sega.co.uk
Wed, 13 Mar 2002 13:22:07 -0000


Ben - I have done more testing on another machine. rdiff-backup is not the
problem here!

I ran my original tests on a Debian Potato machine. It seems that when I
compiled Python 2.2 and Openssh 3.02 on Potato, (there are no debs!!), I
must have done something wrong/misconfigured. I can't really figure out why
- but needless to say, there were many broken dependancies in my way and I
was in a hurry, so I must have "bodged" it. 

When I use rdiff-sync on a Debian woody machine with Python 2.2 and ssh 3.02
debs, data is copied across the network at a rate of about 1mb every 3
seconds or so - much better than the 4/5 mb/min on the Potato testbed!

This pretty much wraps up all of the issues I was having with rdiff-backup -
user error eh? what would we do without it?

Anyway - I'd like to give the seal of approval to this excellent application
of rdiff - again, many thanks! 

This'll save me many disk space headaches and avoid the pitfalls of
incremental backups with rsync  - I'll be reccomending this util to everyone
I know who wants to use a disk based backup.

regards

Stephen

-----Original Message-----
From: Ben Escoto [mailto:bescoto@stanford.edu]
Sent: 12 March 2002 19:48
To: Tan, Stephen; rdiff-backup@keywest.Stanford.EDU
Subject: Re: rdiff-backup across the network 


>>>>> "ST" == Stephen Tan <Tan>
>>>>> wrote the following on Tue, 12 Mar 2002 13:59:47 -0000

  ST> Hi, I'm considering using rdiff-backup here at work and I'm
  ST> generally vry impressed with the concept and ease of use of this
  ST> script!

Thanks!

  ST> It has a very low cpu and memory overhead which is welcome.

Well, I find that it uses lots of cpu, but I'm glad for your
experience.

  ST> There is one thing I was wondering about though, and that is the
  ST> speed across the network of rdiff-backup. I am running across a
  ST> 100mbit switched LAN, and using rsync, I can acheive a transfer
  ST> rate of about 1.5 mb/s (albeit with the CPU consumption going
  ST> very high.)

  ST> I get about about 4-5 mb/min (for actual data transfer speed)
  ST> using rdiff-backup.

Well that sounds good, except for the "min" part.

  ST> You did mention that rdiff-backup was slower than rsync, but I
  ST> did anticipate such a large factor.

  ST> Is this because:

  ST> (i) increasing throughput would load the cpu more? (ii) ssh is a
  ST> bottleneck? (I think this is unlikely!)  (iii) the rdiff
  ST> algorithm is set for smaller bandwidths?

  ST> I'd love to be able to increase the throughput by a factor of
  ST> 2-3 if possible - I have some cpu on both ends to spare and lots
  ST> of bandwidth. Is this possible?

I'm surprised rdiff-backup is doing so poorly.  If cpu and bandwidth
aren't bottlenecks, what is?  There is nothing in rdiff-backup (except
at one small point which I don't think would be an issue) which tells
it to take it easy, so it should just run as fast as it can.

    Are you running rsync over ssh?  I agree that it is unlikely that
ssh is the problem, but comparing rsync w/ ssh to rdiff-backup would
at least remove that variable.

    If I can bother you to do some of my work for me, it would help if
you ran a few tests to try to narrow the problem now.  There are three
natural possibilities for what rdiff-backup is doing too slowly:

1.  Just transferring files.  So you could just transfer one file for
    an initial mirroring and see if that is much slower than rsync.

2.  Comparing lots of files that are the same to see if they are the
    same.  I'm sure rdiff-backup has more overhead than rsync (or at
    least it should, considering how much overhead there is, and rsync
    has that neato superpipelining stuff), so this may be part of the
    problem.

3.  Updating changed files when most of the file is the same.  So
    maybe rsync is using a better diffing algorithm than rdiff.

I ran some benchmarks of an earlier version (0.4.x?) against rsync.  I
found that rdiff-backup uses more memory than rsync for small file
sets, but uses A LOT less memory for large file sets (rsync wants to
load the whole filelist into memory at once).  In the local case,
rdiff-backup was 25% faster maybe.  For the remote case, I think rsync
is about twice as fast (?) for lots of small files or not much change
(but this isn't as bad as it sounds, because rsync is already
something like 1000 times faster than ftp for small files under some
conditions - read Tridgell's dissertation), but rdiff-backup
approaches equality the larger the files get.  But even if I'm
remembering this correctly, the current version is probably much
slower.


--
Ben Escoto