[PATCH] Check case-folding clashes during addremove

Alexis S. L. Carvalho alexis at cecm.usp.br
Sun Apr 15 23:08:36 UTC 2007


Thus spake Andrei Vermel:
> >I think we should optimize for the common case of no collisions. So
> >instead we should do:
> >
> >  fl = file.lower()
> >  if fl in self.foldmap:
> >     issue error/warning
> >     self.foldmap[fl] += 1
> >  else:
> >     self.foldmap[fl] = 1
> >
> 
> Yes, this optimisation makes sense. Attached is an updated patch.
> 
> After a bit of thinking, it seems to me that dirstate.update is not
> a good place to call the check function, due to the fact that it gets called
> with only a single file.

We really want to do these checks only when a file is added (by hg add,
addremove, copy, move or somebody using python directly).
dirstate.update feels like the right place for this...

But we could change the callers to call dirstate.update with all the
files available.  Doing this for localrepo.add shouldn't be hard; for
copies/moves we'd probably need a new method in localrepo.

Unfortunately I think the current initdirs/updatedirs/checkinterfering
check would stop working in some cases (e.g. hg add link link/file where
link is a symbolic link), since it doesn't update its data while doing
the checks.  But fixing this should be doable.

>                          Some of the problems:
> 
> * On a casefolding file system abort happens before displaying
>  all clashes.
> * When a file is renamed from aaa.txt to AAA.TXT, it gets copied
>   first and an unnecessary clash warning appears.
>   (On a casefolding file system this can't be done anyway, so one has to
>   rename aaa.txt to tmp.txt and then tmp.txt to AAA.TXT, and this works)

Allowing this would open an interesting way for the user to shoot his
foot by doing

 hg mv aaa AAA
 hg commit -m 'case clash' AAA

Not sure how to handle this.  (Truthfully, we already have a similar
problem when somebody does hg merge and then commits only a few files...)

> * A case changing rename of a folder with many files would have a
>   performance overhead. If the check is done where all the file names
>   to be added are available, the new files detected as clashing can be
>   put to a map so that finding the corresponding clashes would require
>   only a single iteration over dirstate.map.

Not ending up with quadratic behaviour here would be nice, but I don't
really think this is a place where performance is critical.

Some comments about the patch:

> # HG changeset patch
> # User Andrei Vermel <avermel at mail.ru>
> # Date 1175241778 -14400
> # Node ID 8593076e366928df2401a33f3f1e0187262ecd94
> # Parent  a1406a50ca83e9d3388f7273c8f6ee9283d83c5c
> Check case-folding clashes during addremove
> 
> diff -r a1406a50ca83 -r 8593076e3669 mercurial/dirstate.py
> --- a/mercurial/dirstate.py  Tue Mar 13 13:17:26 2007 +0100
> +++ b/mercurial/dirstate.py  Fri Mar 30 12:02:58 2007 +0400
> @@ -20,6 +20,7 @@ class dirstate(object):
>         self.dirty = 0
>         self.ui = ui
>         self.map = None
> +        self.foldmap = None
>         self.pl = None
>         self.dirs = None
>         self.copymap = {}
> @@ -173,6 +174,17 @@ class dirstate(object):
>         if self.map is None:
>             self.read()
> 
> +    def add_foldmap(self, file):
> +        fl = file.lower()
> +        if fl in self.foldmap:
> +            self.foldmap[fl]+=1
> +        else:
> +            self.foldmap[fl]=1
> +
> +    def del_foldmap(self, file):
> +        fl = file.lower()
> +        self.foldmap[fl]-=1
> +
>     def parse(self, st):
>         self.pl = [st[:20], st[20: 40]]
> 
> @@ -196,10 +208,12 @@ class dirstate(object):
>                 f, c = f.split('\0')
>                 copymap[f] = c
>             map[f] = e[:4]
> +            self.add_foldmap(f)
>             pos = newpos
> 
>     def read(self):
>         self.map = {}
> +        self.foldmap = {}
>         self.pl = [nullid, nullid]
>         try:
>             st = self.opener("dirstate").read()

Instead of populating the foldmap every time we read the dirstate file,
it'd be better to do it only when we're going to use it, just like the
initdirs/checkinterfering calls in dirstate.update below.  (Maybe it
even makes sense to rename initdirs/updatedirs -> initchecks/updatechecks,
add the foldmap management to these functions and make checkinterfering
call check_add_case_clashes)

> @@ -268,9 +282,11 @@ class dirstate(object):
>         if state == "a":
>             self.initdirs()
>             self.checkinterfering(files)
> +            self.check_add_case_clashes(files)
>         for f in files:
>             if state == "r":
>                 self.map[f] = ('r', 0, 0, 0)
> +                self.add_foldmap(f)
>                 self.updatedirs(f, -1)

add_foldmap or del_foldmap?

(and this is not from your patch, but this call to updatedirs doesn't
look very consistent.  Or effective.  hmm...)

>             else:
>                 if state == "a":
> @@ -279,6 +295,7 @@ class dirstate(object):
>                 st_size = kw.get('st_size', s.st_size)
>                 st_mtime = kw.get('st_mtime', s.st_mtime)
>                 self.map[f] = (state, s.st_mode, st_size, st_mtime)
> +                self.add_foldmap(f)
>             if self.copymap.has_key(f):
>                 del self.copymap[f]
> 
> @@ -290,6 +307,7 @@ class dirstate(object):
>         for f in files:
>             try:
>                 del self.map[f]
> +                self.del_foldmap(f)
>                 self.updatedirs(f, -1)
>             except KeyError:
>                 self.ui.warn(_("not in dirstate: %s!\n") % f)
> @@ -297,6 +315,7 @@ class dirstate(object):
> 
>     def clear(self):
>         self.map = {}
> +        self.foldmap = {}
>         self.copymap = {}
>         self.dirs = None
>         self.markdirty()
> @@ -308,6 +327,7 @@ class dirstate(object):
>                 self.map[f] = ('n', 0777, -1, 0)
>             else:
>                 self.map[f] = ('n', 0666, -1, 0)
> +            self.add_foldmap(f)
>         self.pl = (parent, nullid)
>         self.markdirty()
> 
> @@ -552,3 +572,37 @@ class dirstate(object):
> 
>         return (lookup, modified, added, removed, deleted, unknown,
> ignored,
>                 clean)
> +
> +    def check_add_case_clashes(self, add_files):
> +        self.lazyread()
> +        foldmap = self.foldmap
> +        fold_clash = None
> +        for fn in add_files:
> +            fold=fn.lower()
> +            if fold in foldmap:
> +                nmb = foldmap[fold]
> +                if nmb == 0:
> +                    continue
> +                if fn in self.map: # add existing file is valid,
> +                    continue       #  e.g. qrefresh does it
> +
> +                coll=self.report_case_collisions(fn)
> +
> +
> +    def report_case_collisions(self, file):
> +        fold = file.lower()
> +        clashes=[]
> +        for fn in self.map.iterkeys():
> +            fl = fn.lower()
> +            if fl == fold and fn != file and self.map[fn][0] != 'r':
> +                clashes.append(fn)
> +
> +        if clashes != []:
> +           self.ui.warn(_('\nname case fold clashes found!\n'))
> +           self.ui.warn(_('    was %s') % clashes[0])
> +           for fn in clashes[1:]:
> +               self.ui.warn(_(',\n        %s') % fn)
> +           self.ui.warn(_(', now %s\n') % file)
> +
> +           if not util.checkfolding(self.root):
> +               raise util.Abort(_('add aborted due to case folding
> clashes'))

Alexis



More information about the Mercurial-devel mailing list