overwriting invalid caches and cache races
Michael O'Connor
mkoconnor at gmail.com
Wed Sep 9 15:48:57 UTC 2015
We recently had an issue where people were experiencing long push
times to a central repo. These long push times were unrelated to the
size of the push, were random and rare, and started happening suddenly
(i.e., before a certain date they didn't happen to anybody).
I have a story for this, and I'm curious if anyone has an opinion on
the plausibility:
I can't reproduce it yet, but I suspect that if two people push at the
same time, with some low probability one of them may see an invalid
cache (i.e., a cache that references a revision that they don't have
in their view of the repository). At least in hg 3.0.2, which is the
version the central repo runs.
The central repo has two caches on disk: the served cache and the base
cache. (This repo has no mutable changesets, so if the caches were
up-to-date they would be the same.) Normally, push races don't affect
us because if an "hg push" sees an invalid served cache, it drops down
to the base cache which isn't too out-of-date.
However, on the date when we started seeing random long push times,
the served cache in the central repo became corrupt for reasons
unrelated to hg. Now, every push always dropped down to using the
base cache when it wanted the served cache. Furthermore, because no
changesets were mutable, it never updated the served cache on disk and
it stayed corrupt. However, this meant that because there was only
one valid cache, when a push randomly saw an invalid base cache it had
to recreate it from scratch.
I have two questions:
I initially thought that the fact that hg wasn't overwriting the
invalid served cache was an oversight. Upon further reflection, it
seems like this is probably intentional. For example, if
branchmap.updatecache always wrote out a new cache, then work might be
duplicated: if the served and base caches are equal, and someone
updates the served cache, then someone who later accesses the base
cache will have to redo that work. Is that correct?
How plausible is the hypothesis that there's a race on "hg push"ing
that might cause a push to see an invalid cache?
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial/attachments/20150909/e54409ca/attachment.html>
More information about the Mercurial
mailing list