rdiff-backup optimization

dean gaudet dean-list-rdiff-backup@arctic.org
Sun, 19 May 2002 20:03:19 -0700 (PDT)


On Mon, 20 May 2002, Donovan Baarda wrote:

> 10000 files in one dir is a lot. Many filesystems slow down exponentialy as
> the number of files increases, which is why things like squid use three
> levels of directories rather than one directory full of all the objects.
>
> Perhaps rdiff-backup is accessing files by name (causing directory lookups
> which is what hurts) more than rsync is (ie, stat, then open)?
>
> Not sure if these lookups would register as sys time of user time though...

the lookup counts as sys time... i worried about the same thing until i
noticed that most of the time in the second test was user time.

the reason squid and such have multi-layered directory hashing is because
they need to create and delete files -- which require modifying the
directory contents themselves... and that is O(N^2).  but rdiff-backup
generally only modifies inodes.

also -- on linux (2.2 and later) with the dcache, a series of operations
on the same filename generally only pay the directory lookup once.

-dean