rdiff-backup optimization
Ben Escoto
bescoto@stanford.edu
Thu, 16 May 2002 10:44:08 -0700
--==_Exmh_689733639P
Content-Type: text/plain; charset=us-ascii
>>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
>>>>> wrote the following on Thu, 16 May 2002 19:30:34 +1000
DB> I have a cleaner version of the rolling checksum code that is
DB> 2~3x faster, for a start.
DB> I posted a list of things that could be fixed to the rproxy list
DB> a while ago. I'm looking at implementing them now. Depending on
DB> when/if I get developer access on SF, I'll either post it all as
DB> a patch, or release a new version of librsync.
Anything that makes rdiff faster will help with rdiff-backup, of
course, but I think the main problem with rdiff-backup is that it uses
too much CPU time. For instance, if out/ doesn't exist and manyfiles
is a directory containing 10000 1 byte files:
~/prog/python/rdiff-backup/src $ time rsync -a manyfiles/ out
real 0m19.684s
user 0m1.300s
sys 0m5.260s
~/prog/python/rdiff-backup/src $ time rdiff-backup manyfiles out
real 1m32.337s
user 0m59.870s
sys 0m7.980s
Running it again (so no files are changed, and they all just need
to be checked):
~/prog/python/rdiff-backup/src $ time rsync -a --delete manyfiles/ out
real 0m1.598s
user 0m0.990s
sys 0m0.530s
~/prog/python/rdiff-backup/src $ time rdiff-backup manyfiles/ out
real 0m31.987s
user 0m31.340s
sys 0m0.630s
The directory in question is kind of a worst-case test for
rdiff-backup (for copying large files locally, it is actually faster
than rsync), but I think at least the second case is typical, where
rdiff-backup spends a lot of time realizing that nothing has changed.
So rdiff-backup may be waste more system calls (and maybe this
would be a bigger deal under Solaris) but at least on my system the
main reason it is much slower than rsync in these cases is its CPU
time. Also, it seems that a lot of rdiff-backup's code is in the
"inner loop" (profiler says top 10 functions total account for less
than 50% of cpu time) so it won't be easy to get any miracle
increases.
Unless I'm missing something, there are three options as far
rdiff-backup optimization goes:
1. Leave it the way it is.
2. Conceptually rejigger the architecture so it somehow comes out
much faster.
3. Rewrite substantial portions of it in C.
Probably (1) is the only likely one in the near future.
--
Ben Escoto
--==_Exmh_689733639P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001
iD8DBQE84+/l+owuOvknOnURAlfyAJ9DXn7UjLEVniHv6zjD0lrZM31J6ACePmEf
F314ZqWV+28ARGTsnhIYeC4=
=wJw1
-----END PGP SIGNATURE-----
--==_Exmh_689733639P--