Directory statistics questions...

Ben Escoto bescoto@stanford.edu
Thu, 23 May 2002 03:23:24 -0700


--==_Exmh_-1698791422P
Content-Type: text/plain; charset=us-ascii

>>>>> "JP" == Jason Piterak <Jason>
>>>>> wrote the following on Wed, 22 May 2002 21:20:05 -0400

  JP> Hi Ben, Some more ideas from a lazy admin...

Great, keep them coming!

  JP> o How long did it take?

What would be the most convenient format to parse for absolute times
and time intervals?  Everything in seconds?

  JP> o How does this compare to yesterday?
  JP> o How does this compare to an average of the last week?

Are you suggesting that rdiff-backup itself calculate these, or would
it be sufficient to provide information from which these could be
calculated?
 
  JP>   But I've got some questions... The information in the
  JP> directory ststistics files is perfect, but they don't seem to
  JP> work as I would expect:

I'm not sure this will answer your questions, but the way things are
currently set up, TotalFiles and the like refer what was is in the
mirror directory at the start of the session.  For example, suppose
empty_dir is empty and 10files contains ten files.  Then:

~ $ rdiff-backup empty_dir/ out
~ $ cat out/rdiff-backup-data/increments/directory_stat*
cat: No such file or directory
~ $ rdiff-backup 10files/ out
~ $ cat out/rdiff-backup-data/increments/directory_statistics.2002-05-23T00\:05\:57-07\:00.data 
TotalFiles 1
TotalFileSize 4096
ChangedFiles 11
ChangedFileSize 4096
IncrementFileSize 0
~ $ rdiff-backup empty_dir/ out
~ $ cat out/rdiff-backup-data/increments/directory_statistics.2002-05-23T00\:06\:03-07\:00.data 
TotalFiles 11
TotalFileSize 4106
ChangedFiles 11
ChangedFileSize 4106
IncrementFileSize 732

So as you can see, there can be more ChangedFiles than there are
TotalFiles if new files are added.  Also, if a file inside (directly
or indirectly) a directory changes, then the directory is considered
changed, and the ChangedFiles count is incremented.  So maybe this
accounts for the unexpected ChangedFiles result.

    But I can see how more useful and less confusing statistics could
be provided.  How about:

SourceFiles
SourceFileSize
MirrorFiles
MirrorFileSize
NewFiles
NewFileSize
DeletedFiles
DeletedFileSize
ChangedFiles
ChangedSourceSize
ChangedMirrorSize
IncrementFileSize

?  These categories would be pretty unambiguous?  Finally, what is the
right way to count directories?  Should their reported sizes be added
to the Size statistics (currently they are)?  And when should a
directory be considered changed so that it is included in the
ChangedFiles count?


--
Ben Escoto

--==_Exmh_-1698791422P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001

iD8DBQE87MMW+owuOvknOnURAoRtAJ9MJ5ZktUULja8JLBnNw4Xh5En7KwCgkAo3
10cjjid9iaIGlhB1Q0Qc1oA=
=ObPc
-----END PGP SIGNATURE-----

--==_Exmh_-1698791422P--