An Interesting thought-maybe...

Spicer, Kevin Kevin.Spicer@bmrb.co.uk
Wed, 19 Jun 2002 23:09:05 +0100


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C217DD.ED4F8CA0
Content-Type: text/plain;
	charset="iso-8859-1"

First some background...
I use rdiff-backup to make a remote backup of a data directory from remote
site.  Yesterday the source disk was nearing its capacity which triggered an
email alert to the group of users who 'own' the data.  Helpfully they
responded by logging into the machine and gzipping huge quantities of files.
At 9.15pm rdiff-backup kicked off.  At 8.45pm I logged onto the remote
machine to upgrade it to 0.9.1 in readiness for tonights backup, quite
reasonably thinking that  the rdiff-backup job would have finished some 14
hours ago.  It didn't.  Not unsurprisingly upgrading it whilst it was
running didn't go down too well & rdiff-backup crashed.

Enough rambling, my point is this...
When people zip files they a) change the filename and b) change (in a binary
sense) the content - but they don't change (in a human sense) the files real
content.  I don't think it would be unreasonable to guess that rdiff backup
is transferring the entire file again.
However 99% of zip files consist of the original filename with a suffix, if
it was possible to summise which file had been zipped and how and just
replicate this action on the remote machine this would likely be much
quicker.
In theory a neat idea I think (even if I do say so myself).  I suspect
actually trying to implement this would be much more difficult.  Several
complicating factors I can think of...
different zip formats (zip gzip bz2 etc.)
different levels of compression (do zip files record this in their
metadata?)
file could have changed between previous backup and zipping (either unzip
and compare or zip remote side then compare?)
Doesn't account for moving zipped files (but then it doesn't account for
files moving anyway)
What about tar too?

I guess there would be a lot of work here, but I just wondered whether
others think there might be gains to be made here?


Kevin



BMRB International 
http://www.bmrb.co.uk +44 (0)20 8566 5000 

____________________________________________________________ 
This message (and any attachment) is intended only for the recipient and may
contain confidential and/or privileged material. If you have received this
in error, please contact the sender and delete this message immediately.
Disclosure, copying or other action taken in respect of this email or in
reliance on it is prohibited. BMRB International Limited accepts no
liability in relation to any personal emails, or content of any email which
does not directly relate to our business.

------_=_NextPart_001_01C217DD.ED4F8CA0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>An Interesting thought-maybe...</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>First some background...</FONT>
<BR><FONT SIZE=3D2>I use rdiff-backup to make a remote backup of a data =
directory from remote site.&nbsp; Yesterday the source disk was nearing =
its capacity which triggered an email alert to the group of users who =
'own' the data.&nbsp; Helpfully they responded by logging into the =
machine and gzipping huge quantities of files.&nbsp; At 9.15pm =
rdiff-backup kicked off.&nbsp; At 8.45pm I logged onto the remote =
machine to upgrade it to 0.9.1 in readiness for tonights backup, quite =
reasonably thinking that&nbsp; the rdiff-backup job would have finished =
some 14 hours ago.&nbsp; It didn't.&nbsp; Not unsurprisingly upgrading =
it whilst it was running didn't go down too well &amp; rdiff-backup =
crashed.</FONT></P>

<P><FONT SIZE=3D2>Enough rambling, my point is this...</FONT>
<BR><FONT SIZE=3D2>When people zip files they a) change the filename =
and b) change (in a binary sense) the content - but they don't change =
(in a human sense) the files real content.&nbsp; I don't think it would =
be unreasonable to guess that rdiff backup is transferring the entire =
file again.</FONT></P>

<P><FONT SIZE=3D2>However 99% of zip files consist of the original =
filename with a suffix, if it was possible to summise which file had =
been zipped and how and just replicate this action on the remote =
machine this would likely be much quicker.</FONT></P>

<P><FONT SIZE=3D2>In theory a neat idea I think (even if I do say so =
myself).&nbsp; I suspect actually trying to implement this would be =
much more difficult.&nbsp; Several complicating factors I can think =
of...</FONT></P>

<P><FONT SIZE=3D2>different zip formats (zip gzip bz2 etc.)</FONT>
<BR><FONT SIZE=3D2>different levels of compression (do zip files record =
this in their metadata?)</FONT>
<BR><FONT SIZE=3D2>file could have changed between previous backup and =
zipping (either unzip and compare or zip remote side then =
compare?)</FONT>
<BR><FONT SIZE=3D2>Doesn't account for moving zipped files (but then it =
doesn't account for files moving anyway)</FONT>
<BR><FONT SIZE=3D2>What about tar too?</FONT>
</P>

<P><FONT SIZE=3D2>I guess there would be a lot of work here, but I just =
wondered whether others think there might be gains to be made =
here?</FONT></P>
<BR>

<P><FONT SIZE=3D2>Kevin</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>BMRB International </FONT>
<BR><FONT SIZE=3D2><A HREF=3D"http://www.bmrb.co.uk" =
TARGET=3D"_blank">http://www.bmrb.co.uk</A> +44 (0)20 8566 5000 </FONT>
</P>

<P><FONT =
SIZE=3D2>____________________________________________________________ =
</FONT>
<BR><FONT SIZE=3D2>This message (and any attachment) is intended only =
for the recipient and may contain confidential and/or privileged =
material. If you have received this in error, please contact the sender =
and delete this message immediately. Disclosure, copying or other =
action taken in respect of this email or in reliance on it is =
prohibited. BMRB International Limited accepts no liability in relation =
to any personal emails, or content of any email which does not directly =
relate to our business.</FONT></P>

</BODY>
</HTML>
------_=_NextPart_001_01C217DD.ED4F8CA0--