[PATCH STABLE] largefiles: check wheter specified patterns are related to largefiles strictly
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Thu Feb 16 08:46:04 UTC 2012
At Wed, 15 Feb 2012 15:08:38 +0100,
Na'Tosha Bard wrote:
> > largefiles: check wheter specified patterns are related to largefiles
> > strictly
> >
> > current 'lfiles_repo.status()' implementation examines whether
> > specified patterns are related to largefiles in working directory (not
> > to STANDIN) or not by NOT-EMPTY-NESS of below list:
> >
> > [f for f in match.files() if f in lfdirstate]
> >
> > but it can not be assumed that all in 'match.files()' are file itself
> > exactly, because user may only specify part of path to match whole
> > under subdirectories recursively.
> >
> > above examination will mis-recognize such pattern as 'not related to
> > largefiles', and executes normal 'status()' procedure. so, 'hg status'
> > shows '?'(unknown) status for largefiles in working directory unexpectedly.
> >
> > this patch examines relation of pattern to largefiles by applying
> > 'match()' on each entries in lfdirstate and checking wheter there is
> > no matched entry.
> >
> > it may increase cost of examination, because it causes of full scan of
> > entries in lfdirstate.
> >
> > so this patch uses normal for-loop instead of list comprehensions, to
> > decrease cost when matching is found.
> >
>
> Did you do any performance testing before and after this patch? What is
> the difference in performance? What sort of repository did you test it on?
>
> Na'Tosha
Not yet tested on real repo, just considered about ORDER of
processing.
before:
(b1) lookup in 'lfdirstate' => O(1)
(b2) loop by 'match.files()' => O(N_matchfiles) : N of 'match.files()'
after:
(a1) loop by 'lfdirstate' => O(N_lfiles) : N of lfdirstate
(a2) examination by 'match(f)' => O(N_matchfiles)
If 'N_matchfiles' can be assumed as few enough (and I think it can),
main performance difference is between (b1) and (a1).
'N_lfiles' is not small in ordinary cases, so patched code will
increase execution cost clearly.
I don't have any other good ideas to fix this problem (= showing '?'
for largefile itself) with current policy for 'performance boost'
route choice. so I posted this patch, even though it increases
execution cost.
Of course, there are other choices:
- fix this problem by any other ways, or
- change policy of 'performance boost' route choice itself
for example: choose 'slow' route when non-file pattern is specified
By the way, current checking by lfdirstate does not work expectedly (=
show status of largefile itself not of STANDIN), when "hg status"
against the rev tracking largefiles is invoked on working context not
tracking largefiles.
# I hit on this situation after patch post ....
Here, which of ways should be choosen ?
(1) check on both contexts whether there are any tracked files:
- it is STANDIN, and
- non-STANDIN part is matched to specified pattern
(2) choose 'slow' route, if both of specified revision are not
'working dir'
The later seems to be better, because of performance impact scope.
> > diff -r f7e0d95d0a0b -r c0a0446aaa86 hgext/largefiles/reposetup.py
> > --- a/hgext/largefiles/reposetup.py Fri Feb 10 16:52:32 2012 -0600
> > +++ b/hgext/largefiles/reposetup.py Wed Feb 15 23:01:09 2012 +0900
> > @@ -118,8 +118,10 @@
> > # handle it -- thus gaining a big performance boost.
> > lfdirstate = lfutil.openlfdirstate(ui, self)
> > if match.files() and not match.anypats():
> > - matchedfiles = [f for f in match.files() if f in lfdirstate]
> > - if not matchedfiles:
> > + for f in lfdirstate:
> > + if match(f):
> > + break
> > + else:
> > return super(lfiles_repo, self).status(node1,
> > node2,
> > match, listignored, listclean,
> > listunknown, listsubrepos)
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list