Tag caching, at last
Greg Ward
greg at gerg.ca
Mon Jul 13 02:18:16 UTC 2009
On Sun, Jul 12, 2009 at 8:34 PM, Matt Mackall<mpm at selenic.com> wrote:
> On Fri, 2009-07-10 at 11:34 -0400, Greg Ward wrote:
>> I finally have a working tag cache implementation. The "ah-ha" moment
>> came when I realized we don't *have* to cache .hgtags contents, since
>> that's not the most expensive part of reading tags.
>
> I think you're still making that bit too hard. Simply write out the tags
> as calculated and use them if and only if the cache is determined to be
> fully up to date. No need to be clever. Remember: no change is the
> common case.
OK, sure, my *previous* implementations may have made life too hard by
trying to be clever with .hgtags content. But giving up on caching
.hgtags content makes things as simple as I can make them.
Algorithmically, I can't think of how to make things simpler than in
the patch I posted tonight. (Which is essentially the same as the
code you reviewed.)
Here's the rub: strip can modify tag info without changing tip. Sure,
the rev num of tip changes, but then I have to detect that, which
starts getting complicated again. So it's not enough simply to
compare tips; we'd have to compare the whole head list, and there goes
the "skip repo.heads()" optimization. Argh: round and round in
circles we go.
> If we want to be slightly clever we can notice when tip moves without
> changing .hgtags (the second most common case) and make that fast too.
> But that can wait till later.
Yes. And I argue that *any* caching of .hgtags content can wait until
later. Caching the head->filenode mapping is the big win, and I
actually got that working (unlike all my other attempts, which
foundered on strip, rollback, and tag rank). I would just like to get
the current cache design into the mainline, and then we can see what
other optimizations are worth the trouble.
>> __slots__ = ['ui',
>> 'repo',
>> 'shouldwrite',
>> ]
>
> We don't generally use __slots__ around here unless there's significant
> performance or memory at stake.
OK, sure. Reflex coding on my part.
>> cacheversion = 0 # format version
>
> I'm not much of a fan of version numbers in file formats. Here we don't
> need one, because it's just a cache. If we decide to change the format,
> it's actually simplest to just change the filename at the same time.
Hey, good idea. Well, I was just toying with the version idea to see
what people would think. Guess you answered that question. I'll
remove it.
> (ignoring for the moment that I think a class is overkill here)
Not gonna fight that one either. Like I said before, having it in a
class was a godsend when I was juggling 3 or 4 implementations.
Having settled on one that I like, though, readcache() and
writecache() could just become more methods of localrepository.
(Although having a tags module definitely appeals. This is a big
chunk of complexity that I'm about to make more complex, and localrepo
is big and hairy enough already.)
>> # Determine which heads have been destroyed by strip or
>> # rollback. That'll ensure that writecache() writes accurate
>> # data, and it makes life easier for
>> # localrepo._findglobaltags().
>> goodheads = []
>> for head in cacheheads:
>> try:
>> repo.changelog.rev(head)
>> goodheads.append(head)
>> except error.LookupError:
>> pass
>
> It'd be nice to find a better pattern for that. Perhaps:
>
> goodheads = [h for h in cacheheads if h in repo]
>
> which means adding a __contains__ method to repo that doesn't raise
> LookupError.
I don't know the code well enough to say if the above is a common
pattern. I wouldn't add __contains__() just for this case, though.
Please ignore the patch series I sent tonight. (Although any comments
you may have on the runup refactorings are welcome.) I'll resend
tomorrow.
Greg
More information about the Mercurial-devel
mailing list