Revision Sets Performance

robin clowers robin.clowers at gmail.com
Wed Aug 18 22:54:38 UTC 2010


Great, thanks Matt.  We will test the perf without the first branch command.

Thanks,
Robin

On Wed, Aug 18, 2010 at 1:52 PM, Matt Mackall <mpm at selenic.com> wrote:

> On Tue, 2010-08-17 at 15:34 -0700, robin clowers wrote:
> > I’m a developer on CodePlex.com, and we recently started looking at
> > using rev sets to allow our users to view the revisions in a branch.
> > To get paging support, the query became quite complex, so we ran some
> > performance testing and found it was pretty cpu intensive (compared to
> > other mercurial commands we use).  I was wondering if there is a
> > better way to get this information, and if there is any plan to
> > optimize these functions?
> >
> >
> > I realize this is never going to be on par with simpler commands, I
> > just want to make sure I'm not missing something.
> >
> >
> > For example, this is how we are getting the second set of 10 revisions
> > for a given branch:
> >
> >
> >
> > log -r "limit(reverse(limit(branch('branchName') or
> > ancestors(branch('branchName ')), 20)), 10)"
>
> ancestors() is inclusive, so this can be simplified:
>
> limit(reverse(limit(ancestors(branch('branchname'))...
>
> That should be twice as fast.
>
> The expensive operation here is branch. It's current implementation is
> not very smart:
>
> def branch(repo, subset, x):
>    s = getset(repo, range(len(repo)), x)
>    b = set()
>    for r in s:
>        b.add(repo[r].branch())
>    s = set(s)
>    return [r for r in subset if r in s or repo[r].branch() in b]
>
> In other words, it basically visits every cset and checks if it's got a
> matching branch name. We can do slightly better than that (visiting only
> ancestors of branch heads) but not much. Unfortunately branch() is just
> expensive.
>
> Also note the return here: revsets currently work on ordered lists of
> revisions, so we're scanning the entire repo, stripping it down to 20
> items, reversing it again, then stripping it down to 10. Expensive.
> So each 'page' operations has to unpack O(N) changelog descriptions.
>
> Switching to a generator model could make some of this faster at least
> for early pages, but some predicates like reverse operate on entire
> sets. We'd need a new predicate like limit to get bits from the middle
> of a list. But the last page would be just as slow as the current model,
> as we'd still need to scan all of history to discard all of it.
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20100818/a696cc59/attachment.html>


More information about the Mercurial mailing list