script for determining space consumed by increments

dean gaudet dean-list-rdiff-backup@arctic.org
Sat, 11 May 2002 11:23:45 -0700 (PDT)


here's a script i wrote which can find out what directories are taking up
space in an increment... useful for determining what you may want to
exclude from future backups.

it's similar to du, except not sorted in any particular order... i usually
run it something like:

incdu /home/backup/twinlark/root 2002-05-09T01:38:39-07:00 | sort -nr

-dean


#!/usr/bin/perl -w

# hack of a script to print the amount of disk space used
# by a particular rdiff-backup increment
#
# -- dean gaudet <dean@arctic.org>

# XXX: could get this from the filesystem
my $blocksize = 4096;


use strict;

if ($#ARGV != 1) {
	die "usage: $0 rdiff-backup-mirror-dir timestamp\n";
}

my $mirror = shift;
my $timestamp = shift;

-d "$mirror/rdiff-backup-data/increments" or die "$mirror/rdiff-backup-data/increments does not exist\n";

-f "$mirror/rdiff-backup-data/increments.$timestamp.dir" or die "$mirror/rdiff-backup-data/increments.$timestamp.dir does not exist\n";


# i could do this more efficiently -- it's only necessary to
# keep sizes for the ancestors of the current directory...
# but i was lazy.
my %du;

open(FIND, "-|")
	|| exec 'find', "$mirror/rdiff-backup-data/increments",
		'-depth',
		'-regex', ".*$timestamp\.[a-z.]+",
		'-printf', '%s %P\n';

while (<FIND>) {
	my ($size, $dir) = m#^(\d+) (.*)/[^/]+\.$timestamp\.[a-z.]+$#o;

	# ignore the .dir files in the toplevel
	next unless defined($dir);

	# round size up to the next block
	$size = $blocksize * (($size + ($blocksize-1)) / $blocksize);

	$dir = "/$dir";

	do {
		if (!defined($du{$dir})) {
			$du{$dir} = 0;
		}
		$du{$dir} += $size;
		($dir) = ($dir =~ m#^(.*)/[^/]+$#);
	} while (defined($dir));
}

# nicer view of the top-level component
$du{'/'} = $du{''};
delete $du{''};

my $dir;
foreach $dir (keys %du) {
	print "$du{$dir}\t$dir\n";
}