[WIP] lazy revlog index parsing

Matt Mackall mpm at selenic.com
Thu Mar 15 18:59:24 UTC 2012


On Thu, 2012-03-15 at 00:19 -0700, Bryan O'Sullivan wrote:
> On Wed, Mar 14, 2012 at 11:50 PM, Matt Mackall <mpm at selenic.com> wrote:
> 
> >
> > [I'll just note in passing the irony of the author of patchbomb using a
> > pastebin (on github of all places).]
> >
> 
> Indeed. I just sent a refreshed version of the patch with all of the
> WIP-ness cleaned up. It passes all the tests.
> 
> 
> > But it doesn't quite actually work for me, something is screwy with the
> > offset handling that's causing a seek to get an invalid arg.
> 
> 
> Could you try with the patch I just sent, please? Note that I just updated
> it (after I sent it) to include <arpa/inet.h>, so you might need to re-add
> that.

Yeah, that's probably not enough to be portable. I copied the bits from
the top of parsers.c. But now I have a slightly different thought..

It turns out to be a bit of a bother to bump our extension APIs due to
the need for everyone running from source to do rebuilds/repackaging.
It's especially annoying when bisecting across the change. The last time
we did it was for parse_index2 and I think we should aim to avoid it
when it's easy to do so. Ok, so how would we do that? Simple: roll the
index.c code into parsers.c and have parse_index2 return our new index
class rather than a Python list object.

> Ok, now tip is still not much faster: it goes from 0.401s to 0.320s on
> > my test repo. Turns out the bulk of the time here is spent checking that
> > 270 tags point to valid nodes. And doing node->rev mapping is the other
> > thing that could really stand to be optimized. Disable the valid node
> > check and it now takes .069s (compared to perfstartup of .025s).
> >
> 
> I could synthesize a repo and look into that, or perhaps you could send me
> a pointer to your carefully tweaked one?

I'm testing on my linux-kernel mirror: 

http://selenic.com/repos/linux-2.6

But really all that matters is that you have a tag that points to a node
early in history. Discovering the revision of an early node will cause
the nodemap to be nearly fully populated:

http://www.selenic.com/hg/file/7b9bf72430ba/mercurial/revlog.py#l291

And we look up each tag node to sanity-filter the tag list, so any time
we do anything with tags (including tip!), we'll look for that old node:

http://www.selenic.com/hg/file/7b9bf72430ba/mercurial/localrepo.py#l432

Comment out that line and it's fast.

> As far as approach is concerned, the obvious thing to do is build a dict
> and look it up, but that has poor performance if there are lots of revs.

That's mostly because there are tons of Python objects that need to be
dynamically allocated and the table ends up getting resized a few times
too. If we created our own pre-sized hash table that was just an array
of ints (check key = index[hash[f(node)]), we could build it in one
allocation.

But it turns out a dict isn't really the data structure we want at all,
because it can't do prefix matching. That means looking up "7b9bf72"
degrades to linear search!

> The next obvious approach is to dump all the (node ID,rev) pairs into a
> single big malloced buffer, sort, and binary search for lookups. That's
> almost certain to be faster, but it's still a lot of work up front, so the
> startup cost might be unpleasant. If it was found to matter, cost of
> sorting could be amortized over multiple lookups (as the current revlog.rev
> implementation does) using a gapped insertion sort, maybe.

I think the ideal solution is a base-16 radix tree:
        
        struct nodetree {
        	struct *nodetree[16];
        }
        
        #define leafnode(ptr) ((ptr & 1))
        #define torev(ptr) (((int)ptr) >> 1)
        #define toptr(rev) ((struct *nodetree)((rev << 1) + 1))
        
With 1M nodes, it'll have a depth of ~6 and a search has a cache
footprint of about 6 cachelines. And it's obviously a perfect match for
prefix search. Downside is allocating/freeing ~N/16 nodetree structures
of up to 128 bytes.
  
Alternately, we could use a heap in a pre-allocated index-length array
of revs, and using node(rev) as the heap key. Rather than actually doing
a full heap-sort, we can leave it heapified and do prefix search. Search
depth is more like 20 for 1M nodes and a search has a cache footprint of
more like 40 cachelines (ouch!).

But either way, we can build it incrementally as we search back from tip
for a node (much like the current code revlog.rev() code).

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial-devel mailing list