overwriting invalid caches and cache races

Gregory Szorc gregory.szorc at gmail.com
Wed Sep 9 18:18:41 UTC 2015


On Wed, Sep 9, 2015 at 8:48 AM, Michael O'Connor <mkoconnor at gmail.com>
wrote:

> We recently had an issue where people were experiencing long push
> times to a central repo.  These long push times were unrelated to the
> size of the push, were random and rare, and started happening suddenly
> (i.e., before a certain date they didn't happen to anybody).
>
> I have a story for this, and I'm curious if anyone has an opinion on
> the plausibility:
>
> I can't reproduce it yet, but I suspect that if two people push at the
> same time, with some low probability one of them may see an invalid
> cache (i.e., a cache that references a revision that they don't have
> in their view of the repository).  At least in hg 3.0.2, which is the
> version the central repo runs.
>
> The central repo has two caches on disk: the served cache and the base
> cache.  (This repo has no mutable changesets, so if the caches were
> up-to-date they would be the same.)  Normally, push races don't affect
> us because if an "hg push" sees an invalid served cache, it drops down
> to the base cache which isn't too out-of-date.
>
> However, on the date when we started seeing random long push times,
> the served cache in the central repo became corrupt for reasons
> unrelated to hg.  Now, every push always dropped down to using the
> base cache when it wanted the served cache.  Furthermore, because no
> changesets were mutable, it never updated the served cache on disk and
> it stayed corrupt.  However, this meant that because there was only
> one valid cache, when a push randomly saw an invalid base cache it had
> to recreate it from scratch.
>
> I have two questions:
>
> I initially thought that the fact that hg wasn't overwriting the
> invalid served cache was an oversight.  Upon further reflection, it
> seems like this is probably intentional.  For example, if
> branchmap.updatecache always wrote out a new cache, then work might be
> duplicated: if the served and base caches are equal, and someone
> updates the served cache, then someone who later accesses the base
> cache will have to redo that work.  Is that correct?
>
> How plausible is the hypothesis that there's a race on "hg push"ing
> that might cause a push to see an invalid cache?
>

This sounds exactly like problems we had on hg.mozilla.org with our very
large Firefox repositories.

To echo what Matt said, these problems are largely addressed in modern
Mercurial versions and performance problems with caches are mostly a
non-issue for us now. In addition, modern Mercurial versions also are a bit
more intelligent about writing caches inside repo locks, so there should be
fewer race conditions. Things aren't perfect yet, but they are much better
than before.

You may also find enabling the blackbox extension will shed some light into
what is going on. It records times for many operations, including branch
and tags cache updates. I find its logs from pushes invaluable when
debugging performance problems.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20150909/26e4529b/attachment-0002.html>


More information about the Mercurial mailing list