inner vs user-level complexity of duplicity

Ben Escoto bescoto@stanford.edu
Fri, 20 Sep 2002 10:38:06 -0700


--==_Exmh_-779137044P
Content-Type: text/plain; charset=us-ascii

>>>>> "PE" == Peter Ehrenberg <pe@dipe.de>
>>>>> wrote the following on 20 Sep 2002 15:10:09 +0200

  PE> IMHO there is a little bit less documentation on who it
  PE> works. Examples:

  PE>     o The importance of local archive directory.

This is something I've had trouble with, so suggestions are welcome.
The basic problem is that, although duplicity is basically simpler
than rdiff-backup, and probably should consume less bandwidth and CPU
(once profiled, probably not now), it seems more complicated to the
user.

    With rdiff-backup the user can just run "rdiff-backup foo bar",
and think of it as "copying foo to bar".  An "ls bar" will confirm
these suspicions and relieve the user.  With duplicity the user has to
worry about the local archive directory...

    Sorry, the local archive directory just holds the signatures of
the files on the remote side, along with manifest information.  Since
the signatures are stored locally, duplicity can simply generate the
diffs (incremental archive) and store it on the remote side without
ever reading anything from the remote side, or recalculating the
signatures like rsync/rdiff-backup do.  The manifest file is stored
remotely also, but if a local copy is available it will be used.
Since SHA1 checksums of the backup volumes are stored in the (local)
manifest file, if there is any tampering of the data by the remote
side this will be apparent to the user.  This could also be done by
signing the backup volumes (you can do that also) but it doesn't
require a passphrase to compute a checksum.  Also if you trust the
remote side somewhat you can run "sha1sum"/"md5sum" there and make
sure your data was transferred correctly.

    But now that I think about it, maybe I should get rid of the local
archive directory.  It doesn't make much sense from a programming
perspective, but it would make it easier for users to pick up and use
it..  Then maybe there could be an option "--archive-dir" to re-enable
it, which users who want more speed or less network traffic (or the
checksum benefits explained above) could use...

  PE>     o If the backed up machine completely crashes, can I get my
  PE> data back by hand without duplicity? This to know would relieved
  PE> me.

Yes, it is possible, but laborious to do manually.  Basically you
would have to download the tar files on the remote side, and unencrypt
them.  Then you would untar the full archive.  Then untar each
increment tarball, and look at the filename.  If it is in the
snapshot/ directory, copy it over to the full directory.  If in the
diff/ directory, use rdiff and patch what's in the full directory with
the diff.  If there's something in the deleted/ directory, delete the
corresponding thing in the full directory.

    Of course, at some point I could claim that every backup system's
format could be recovered by hand, just by duplicating exactly what
the problem did ("Ok, now move 0xAF34D381 into the AX register...").
But this isn't quite as bad as all that.

  PE> Which verbose level I have to set to get the directory name
  PE> (I've tried -v9 but getting gigabytes of debug output)?

-v7 is enough for it to mention every file, but I'm not sure that you
won't get about the same output.  (In rdiff-backup -v9 would give you
network traffic, but since duplicity doesn't have its own network
traffic probably nothing is at that level.)


-- 
Ben Escoto

--==_Exmh_-779137044P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001

iD8DBQE9i1z9+owuOvknOnURArgmAJ9Vvb1BdSrdobxoKDtuJ8zyUxsl2ACdHpd4
ve0gsEFOi+9hQf0ooFPX1uI=
=Gwaf
-----END PGP SIGNATURE-----

--==_Exmh_-779137044P--