Revset-based implementation of log command

Matt Mackall mpm at selenic.com
Mon Apr 16 18:06:44 UTC 2012


On Fri, 2012-04-13 at 22:15 +0200, Patrick Mézard wrote:
> Le 12/04/12 22:49, Matt Mackall a écrit :
> > On Thu, 2012-04-12 at 21:52 +0200, Patrick Mézard wrote:
> >> [3] could be fixed by turning revset.match() into a generator. It is
> >> not clear which revset functions can operate on a "subset" generator
> >> rather than a list but the simple ones like user() definitely can.
> >> Also, I have no idea how much overhead this will add compared to the
> >> list version.
> > 
> > Any predicate can choose to turn its args from generators into lists, if
> > need be, so it shouldn't be all that hard architecturally to get some of
> > this working.
> 
> To make this valuable for log command we need to turn the revsets used
> by graphlog._makelogrevset() into pure "filters", that is generator
> loops over the "subset" variable. It looks difficult for two of them:
> notset() and orset().
> 
> notset() is used for --no-merge and --prune. One solution is to
> compute the negated expression set not over "subset" but over all
> revisions. This would not affect --prune much. And if we really want
> speed for --no-merge, we can write a dedicated revset implementing it
> directly instead of negating a revision set.

I agree that notset is tricky to turn into an iterator. The primary
difficulty is that given an iterator i for a predicate x, returning
things not in i without consuming all of i is hard unless x has helpful
properties like being sorted. So if we have a revset:

not(2 or 1)

we can't look at the first element from the subexpression and conclude
that everything before 2 is not included. On the other hand, we can
often break the problem down by building a bunch of iterators over
subsets of length 1:

for r in subset:
  g = getset(repo, [r], x)
  if r not in g:
    yield r

This doesn't work for stuff like this:

not(first(keyword(foo)))

..because calculating the first element across a subset gives different
results than calculating the first elements of subsets of subsets.

So let's throw a definition in here: an 'independent revset' is one
where we can evaluate the membership of each element in the set without
considering the others.

In general, "not" can't be fast on non-independent sets, because we must
calculate the inner set in full to calculate its inverse. In fact, time
to first line for any non-independent set, with or without not, is going
to include the time for calculating the full set.

Things like this are orthogonal if their subsets are independent:

 and or not - : merge() user() sort() tag() closed() <revision ids>

(yes, sort: we don't know the order of elements, but we can
independently calculate membership, and thus invert)

Things like this are not:

 heads() roots() first() limit() children()

And things like '::' are on the edge: we can do it, but it's not
terribly efficient.

Which all suggests that we could teach the optimizer to look at the
parse tree of not's subexpression and decide that a) it's going to be
expensive to evaluate and b) it's independent, so we use
independentnotset on it.

Which brings us back to not(merge()). Evaluating this in the current
revset code means visiting the entire index and checking for second
parents, but that's quite cheap, and it seems worth doing all the work
up front in any case. But for not(user(x)), we want to go the
independent route because the subexpression is expensive.

> orset() cannot be turned into a filter in the general case. Here, we
> often have an additional property: input revisions are sorted by
> value. In this case, we can write an orset() version computing both
> "or" branches on "subset" (instead of subset and (subset - "first
> branch output")) and use the revision order to merge back the
> iterable, then preserving the order again. Again, this would be a
> dedicated revset.

I don't see why this can't be an iterable. If we do 'x or y', we simply
need to yield all the elements of x before we start yielding element in
'y - x'. And that just means we need to remember the elements of we've
seen of x as we go. Even if either set is a non-independent revset.

> Is it worth the pain? I do not know but to me the situation is the following:
> 1- We do nothing and have a big regression on time to first byte. This is blocking for using revsets in log command.

Again, I don't think this is a blocker. But we're now at the point where
this has to wait for 2.3 anyway.

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial-devel mailing list