Expanding the include/exclude options

Donovan Baarda abo@minkirri.apana.org.au
Sat, 30 Mar 2002 20:35:01 +1100


On Fri, Mar 29, 2002 at 12:31:43PM -0800, Ben Escoto wrote:
> >>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
> >>>>> wrote the following on Fri, 29 Mar 2002 21:21:44 +1100
[...]
> The implicit including sounds like a good idea, and definitely would
> be good to use with file lists.  It may be compatible with regular
> expressions, or at least the most common ones, I'll have to think
> about it.  Why doesn't rsync use that system?  Did they have a reason
> or did it just turn out that way?

Perhaps a good idea to pin down some invented terminology... 

Lets define "implicit including" to mean implicit matching of parent
directories of matched files. For example, "+/home/abo/** -**' would match
the following;

/home/
/home/abo/
/home/abo/fred
/home/abo/...etc

Lets define "implicit scanning" to mean not pruning parent directories of
matched files, but not including them either unless they are explicitly
included. For example, "+/home/abo/** -**' would _not_ match the following;

/home/

but it would match and find when scanning (note: ** matches an empty
string) the following;

/home/abo/
/home/abo/fred
/home/abo/...etc

I liked "implicit scanning", which is why dirscan.py uses it. You can just do
"+/home/abo/** -**" and have it do what you expect. When you use rsync,
which doesn't do it, you have to do "+/home/ +/home/abo/ +home/abo/** -**"
to ensure that all the required parent directories are included. Implicit
scanning also lets you do wierd things like exclude a directory, but include
it's contents; ie "-/home/ +/home/**"

After implementing it, I know why rsync doesn't do it. It's hard to
implement! It's particularly hard to efficiently prune directories when
scanning. When you don't implicitly scan, pruning is easy; if a directory
doesn't match, you prune it. When you implicitly scan or implicitly include,
you can't prune it if it might be a parent to anything exlicitly included.
Example; "+/home/**/Mail/** -**" means you can't prune anything that matches;

/home/
/home/**

even though they are not explictly included.

Note that although avoiding implicit including and scanning makes pruning
easy, it actualy complicates the other end of checking if an arbitary
filename from a filelist matches.  For example, "-**/Mail/ +**" without
explicit includes should _not_ match anything with /Mail/ in it's path
because it would be pruned. This means you must check each arbitrary
filename by checking all its parent directories to see if they would be
pruned.

>   DB> The future of this code is up in the air. I would like to
>   DB> mantain and make them publicly available under GPL. I have a few
[...]
> Perhaps in some kind of python repository?  (Vaults of
> Parnassus(sp?)?)  I never searched through any of them, but it seems
> something like this would probably best be distributed as a python
> module, so should be wherever it is people look for python modules.

In the past I've used parnassus as a secondary search point for my project
modules, and used freshmeat as the "project page". I guess parnassus alone
would be OK for something really small...

-- 
----------------------------------------------------------------------
ABO: finger abo@minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------