script for determining space consumed by increments

dean gaudet dean-list-rdiff-backup@arctic.org
Sat, 11 May 2002 18:43:47 -0700 (PDT)


On Sun, 12 May 2002, Spicer, Kevin wrote:

> (my whole rdiff-backup run takes about 10hours per night).

it could easily be your OS.  solaris 2.6 has lots of lameness in this
area.  (such as an inability to cache any metadata for files with names
longer than 31 chars... not including the path)  i dealt with 2.6 on
multi-million inode filesystems in the past and i had to go to extreme
effort to run parallel processes when doing any filesystem recursion in
order to get reasonable completion times.

i do a "rdiff-backup remote::/ root" where the remote::/ filesystem has
over 650k inodes / 50GB of data (~150 user shell/web/mail server), and it
takes under 3h for a daily increment.

the backup box is a dual piii 500 / 512MB, the remote is a dual athlon
1.4GHz / 1GB ram, both running linux 2.4.x.

the disks are somewhat irrelevant, from what i can see they're not maxed
out by rdiff-backup at all.

i suspect my bottleneck is sometimes my network -- the backup server lives
on my pacbell home dsl which is 1.5Mbit down, but only 128kbit up ... and
the checksums are transmitted in the direction backuphost => master.
(i'm pretty sure this is the case regardless of the command line args.)
i regularly find my outbound choked while the rdiff-backup is running.

before i had live data, i wasn't sure if rdiff-backup could scale even to
my 650k inodes... all things being equal i expect scaling problems in my
setup if i grow closer to 2M inodes (because the backup will start
encroaching on the busy time of day, when i don't want to be paying the
price for the outbound bandwidth on my server).

i suspect rdiff-backup is highly synchronous -- source and dest remain in
lockstep, considering only 1 file at a time.  with only 20ms between my
servers i don't care so much... but i bet it starts to break down when
there's 100ms+ between servers :)

-dean