[PATCH 1 of 7] revset: introduce a _revinterval type and revsubset function
Bryan O'Sullivan
bos at serpentine.com
Fri Jun 22 21:04:02 UTC 2012
On Mon, Jun 18, 2012 at 6:34 PM, Matt Mackall <mpm at selenic.com> wrote:
> > + def __getitem__(self, i):
> > + l = len(self)
> > + if isinstance(i, slice):
>
> This method looks pretty heavy for a function we'll be calling in an
> inner loop. Do we actually use slices?
>
I would have sworn the answer was yes, as I have a strong memory of adding
the slice check due to some obscure test failing. And yet when I try it now
with that block disabled, nothing is failing. Go figure.
> If we take something like:
>
> r = [x for x in a if x in b]
>
> ..I'm rather skeptical that this sort of implementation can beat the
> native set implementation unless a is much smaller than b. That might
> happen a lot, but it's worrisome.
>
It is very common that b is set(range(len(repo))), in which case merely
constructing the set is nastily expensive in time and memory on even a
moderately large repo.
> +def revsubset(subset):
>
> Hiding in here is the idea of a single hybrid list/set object. We're
> often converting lists into sets, I suspect it might be better to cache
> the set objects somehow.
Maybe?
Honestly, this does all really really want to be a custom extension type
written in C, using a tree of contiguous ranges. This would preserve order,
maintain good cache performance, and keep memory use as low as possible.
Would such a patch be palatable? The existing patchset makes a big
difference to performance, but I'd be fine with spending some time on doing
things the Right Way instead.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20120622/ffe9a996/attachment-0002.html>
More information about the Mercurial-devel
mailing list