Expanding the include/exclude options

Donovan Baarda abo@minkirri.apana.org.au
Tue, 2 Apr 2002 16:37:41 +1000


On Sun, Mar 31, 2002 at 04:47:42AM -0800, Ben Escoto wrote:
> >>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
> >>>>> wrote the following on Sat, 30 Mar 2002 20:35:01 +1100
> 
>   DB> Lets define "implicit including" to mean implicit matching of
>   DB> parent directories of matched files. For example, "+/home/abo/**
>   DB> -**' would match the following;
> 
>   DB> Lets define "implicit scanning" to mean not pruning parent
>   DB> directories of matched files, but not including them either
>   DB> unless they are explicitly included. For example, "+/home/abo/**
>   DB> -**' would _not_ match the following;
> 
> Ok, this sounds useful, but I'm not sure why we wouldn't want to match
> all parent directories of matched files.  Intuitive, it seems that

Some tools, like tar, when you include a directory, will automaticly include
all the files in it. My dirscan.py will return a python list of paths that
could be used for anything, including building lists of files for tar. The
list returned will only include things that explicitly match the
include/exclude list. For example, "+/home/abo/mail/* -**" will return the
following file list;

/home/abo/mail/
/home/abo/mail/sent-mail
/home/abo/mail/drafts
/home/abo/mail/lists

> --include **.txt
> 
> should match /home iff somewhere in home there was a .txt file.  But
> if the only .txt files in /home were in /home/abo, then abo is the
> only file in /home that should get matched and backed up.

Most tools that backup files will automaticly do whatever is needed for
parent directories of explicitly included files. It certainly adds a level
of complexity to try and return a file list that does what you are
suggesting, which means for "+/home/abo/mail/** -**" you want to return;

/
/home/
/home/abo/
/home/abo/mail/
/home/abo/mail/sent-mail
/home/abo/mail/drafts
/home/abo/mail/lists

The biggest problem is you are including paths that are explicitly excluded,
just because they happen to be parent directories of a file that is
explicitly included. This is the opposite of rsync, which excludes files
that are explicitly included just because they are under a directory that is
explicitly excluded.

However, it is worse than rsync, because at least with rsync you can test an
arbitary path against the criteria and determine if it is included/excluded.
With your scheme, you can't make that determination without knowing more
about the whole filesystem. For example "+**.c -**", is /home/ included
or excluded? You can't tell untill you seach the whole of /home/ for *.c
files.

>     So maybe this is a combination of the two, where you look through
> directories not knowing before hand whether or not anything in side
> will get matched, but if anything does, then all the parent
> directories get matched.  (I guess literally this system would do
> "implicit including" but not "implicit scanning".)

yeah, it is "implicit including", with the serious problem of having
implicit includes overiding explicit excludes.

>     For exclude lists, it seems that we should do neither implicit
> including nor implicit scanning.

The rsync way doesn't have seperate "exclude lists". It has one long list,
with each entry in the list specifying an include or exclude. Paths are
compared against each with the first match determining the result. If it
falls out the end of the list it is included by default.

-- 
----------------------------------------------------------------------
ABO: finger abo@minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------