[PATCH STABLE] largefiles: check wheter specified patterns are related to largefiles strictly

Na'Tosha Bard natosha at unity3d.com
Thu Feb 16 15:48:51 UTC 2012


2012/2/16 FUJIWARA Katsunori <foozy at lares.dti.ne.jp>

>
> At Wed, 15 Feb 2012 15:08:38 +0100,
> Na'Tosha Bard wrote:
>
> > > largefiles: check wheter specified patterns are related to largefiles
> > > strictly
> > >
> > > current 'lfiles_repo.status()' implementation examines whether
> > > specified patterns are related to largefiles in working directory (not
> > > to STANDIN) or not by NOT-EMPTY-NESS of below list:
> > >
> > >    [f for f in match.files() if f in lfdirstate]
> > >
> > > but it can not be assumed that all in 'match.files()' are file itself
> > > exactly, because user may only specify part of path to match whole
> > > under subdirectories recursively.
> > >
> > > above examination will mis-recognize such pattern as 'not related to
> > > largefiles', and executes normal 'status()' procedure. so, 'hg status'
> > > shows '?'(unknown) status for largefiles in working directory
> unexpectedly.
> > >
> > > this patch examines relation of pattern to largefiles by applying
> > > 'match()' on each entries in lfdirstate and checking wheter there is
> > > no matched entry.
> > >
> > > it may increase cost of examination, because it causes of full scan of
> > > entries in lfdirstate.
> > >
> > > so this patch uses normal for-loop instead of list comprehensions, to
> > > decrease cost when matching is found.
> > >
> >
> > Did you do any performance testing before and after this patch?  What is
> > the difference in performance?  What sort of repository did you test it
> on?
> >
> > Na'Tosha
>
> Not yet tested on real repo, just considered about ORDER of
> processing.
>
>    before:
>
>        (b1) lookup in 'lfdirstate' => O(1)
>        (b2) loop by 'match.files()' => O(N_matchfiles) : N of
> 'match.files()'
>
>    after:
>
>        (a1) loop by 'lfdirstate' => O(N_lfiles) : N of lfdirstate
>        (a2) examination by 'match(f)' => O(N_matchfiles)
>
> If 'N_matchfiles' can be assumed as few enough (and I think it can),
>

See, this is a tough one.  People using largefiles usually fall into 1 of 2
categories:

1) People with a lower number of extremely large binaries
2) People with a huge number of
not-so-large-but-too-large-to-version-directly binaries

>From bug reports and talking to people, I know there are plenty of users in
both of these categories.  Ideally we'd optimize in a way that won't leave
either side out in the cold, but generally I think group (1) is probably
bigger than (2).

In any case, I'd like to run some performance tests on both our real
repository and some generated test repositories of various sizes before
this patch is applied.  I hope to get to this tomorrow.


> main performance difference is between (b1) and (a1).
>
> 'N_lfiles' is not small in ordinary cases, so patched code will
> increase execution cost clearly.
>
> I don't have any other good ideas to fix this problem (= showing '?'
> for largefile itself) with current policy for 'performance boost'
> route choice. so I posted this patch, even though it increases
> execution cost.
>
> Of course, there are other choices:
>
>    - fix this problem by any other ways, or
>
>    - change policy of 'performance boost' route choice itself
>
>      for example: choose 'slow' route when non-file pattern is specified
>
>
> By the way, current checking by lfdirstate does not work expectedly (=
> show status of largefile itself not of STANDIN), when "hg status"
> against the rev tracking largefiles is invoked on working context not
> tracking largefiles.
>
> # I hit on this situation after patch post ....
>
> Here, which of ways should be choosen ?
>
>    (1) check on both contexts whether there are any tracked files:
>        - it is STANDIN, and
>        - non-STANDIN part is matched to specified pattern
>
>    (2) choose 'slow' route, if both of specified revision are not
>        'working dir'
>
> The later seems to be better, because of performance impact scope.
>

I agree that (2) is more appropriate here.

Cheers,
Na'Tosha


>
> > > diff -r f7e0d95d0a0b -r c0a0446aaa86 hgext/largefiles/reposetup.py
> > > --- a/hgext/largefiles/reposetup.py     Fri Feb 10 16:52:32 2012 -0600
> > > +++ b/hgext/largefiles/reposetup.py     Wed Feb 15 23:01:09 2012 +0900
> > > @@ -118,8 +118,10 @@
> > >                 # handle it -- thus gaining a big performance boost.
> > >                 lfdirstate = lfutil.openlfdirstate(ui, self)
> > >                 if match.files() and not match.anypats():
> > > -                    matchedfiles = [f for f in match.files() if f in
> lfdirstate]
> > > -                    if not matchedfiles:
> > > +                    for f in lfdirstate:
> > > +                        if match(f):
> > > +                            break
> > > +                    else:
> > >                         return super(lfiles_repo, self).status(node1,
> > > node2,
> > >                                 match, listignored, listclean,
> > >                                 listunknown, listsubrepos)
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp
>



-- 
*Na'Tosha Bard*
Build & Infrastructure Developer | Unity Technologies - Copenhagen

*E-Mail:* natosha at unity3d.com
*Skype:* natosha.bard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20120216/fc4470dd/attachment-0002.html>


More information about the Mercurial-devel mailing list