rdiff-backup file format

This web page describes how rdiff-backup stores backup information, at least as of version 0.6.x. This a pretty boring document and should only be useful to people who want to write utilities to automatically process or create rdiff-backup compatible files. Normal people can use rdiff-backup easily without reading any of this (I hope).

1. Overview

When rdiff-backup is run, it copies the source directory (and all the source files, i.e. files in the source directory) to the mirror directory, and writes to a special data directory. For instance, when "rdiff-backup foo bar" is run, foo is the source directory, bar is the mirror directory, and bar/rdiff-backup-data is the data directory.

Each source file is associated with a mirror file, and possibly one or more increments files. If the source file is named path/to/some_file relative to the source directory, then its associated mirror file is also path/to/some_file, but relative to the mirror directory. The associated increments are all named increments/path/to/some_file.EXT relative to the data directory, with the exception of the increments associated with the source directory itself, which are just named increments.EXT. The extensions will be described later.

The purpose of all this is to provide transparent incremental backup. Each mirror file is an exact duplicate of its source file counterpart (one exception: the mirror directory contains the data directory; the source directory doesn't), while each increment file represents the state of its corresponding source file counterpart at some time in the past.

2. The mirror directory

After rdiff-backup is run, the contents of the mirror directory are the same as the contents of the source directory. This is why it is called the mirror directory. But there is one exception - the mirror directory will contain the data directory while the source directory won't. (If the source directory contains a directory called rdiff-backup-data, it will be ignored.)

What constitutes sameness here could vary between instances. For example, a non-root user running rdiff-backup will not be able to change the permissions on the mirror files, so in this case two files could be the same even if they have different ownership. But, the closer the better.

3. The data directory

rdiff-backup may write to log files in this directory, like backup.log and restore.log. rdiff-backup never reads these, so they can be deleted if convenient.

More importantly, all the increment files are stored in the data directory.

3a. Increment file names

As mentioned earlier, each increment file is associated with a source file. The source directory itself is associated with the increment files increments.EXT; source files of the form path/to/some_file are associated with increment files increments/path/to/some_file.EXT.

The extension has the form [timestring].[suffix]. The timestring is in w3 datetime format, described at http://www.w3.org/TR/NOTE-datetime. This format was chosen because it seemed semi-standard, it is not too hard for humans to read, represented time order and ascii sort order are the same (so 'ls' gives you the increments in order), and it doesn't contain characters which usually require quoting when typed into a shell. An example of a w3 datetime timestring is 2001-12-05T18:18:57-07:00, meaning December 5th, 2001, 6:18:15PM, US Pacific time (7 hours before UTC). The increment file represents the state of the source file at the indicated time.

The suffix is one of snapshot, diff, dir, or missing, indicating an increment file of type snapshot, diff, or, dir and missing markers, respectively.

3b. Increment file types

There are four increment file types:

4. Basic restoration procedure

The above more or less determines the basic restore strategy. Suppose we want to restore a file back to time T. First we make the mirror file the restoration candidate. If there are no increments, then we are done - the mirror file is what we want. If there are increments, consider the one dated last. If it is a: Then repeat this procedure, moving backward in time, applying earlier and earlier diffs. The final restoration candidate is the source file as it was at time T.

5. Concluding remarks

Well, that's all. I realize that the above falls short of mathematical rigor, but hopefully it is enough for the readers' purposes. Please mail me or post to the mailing list if something is unclear or too brief.
Last modified: Sat Feb 1 11:59:44 PST 2003