overwriting invalid caches and cache races

Michael O'Connor mkoconnor at gmail.com
Wed Sep 9 18:58:04 UTC 2015


Thanks, Greg and Matt, sounds like just upgrading mercurial has a good
chance of resolving these issues.

On Wed, Sep 9, 2015 at 1:48 PM, Matt Mackall <mpm at selenic.com> wrote:

> On Wed, 2015-09-09 at 11:48 -0400, Michael O'Connor wrote:
> > We recently had an issue where people were experiencing long push
> > times to a central repo.  These long push times were unrelated to the
> > size of the push, were random and rare, and started happening suddenly
> > (i.e., before a certain date they didn't happen to anybody).
> >
> > I have a story for this, and I'm curious if anyone has an opinion on
> > the plausibility:
> >
> > I can't reproduce it yet, but I suspect that if two people push at the
> > same time, with some low probability one of them may see an invalid
> > cache (i.e., a cache that references a revision that they don't have
> > in their view of the repository).  At least in hg 3.0.2, which is the
> > version the central repo runs.
>
> We've substantially improved the performance of both the branch cache
> and the tag cache since this release.
>
> > The central repo has two caches on disk: the served cache and the base
> > cache.  (This repo has no mutable changesets, so if the caches were
> > up-to-date they would be the same.)  Normally, push races don't affect
> > us because if an "hg push" sees an invalid served cache, it drops down
> > to the base cache which isn't too out-of-date.
> >
> > However, on the date when we started seeing random long push times,
> > the served cache in the central repo became corrupt for reasons
> > unrelated to hg.  Now, every push always dropped down to using the
> > base cache when it wanted the served cache.
>
> The serving process will update/correct the cache if it has write
> access. But it's not unheard of for people to accidentally set up hgweb
> so that it actually doesn't (perhaps because someone else touched the
> cache and thus put hostile permissions on it).. and every access has to
> rebuild the cache.
>

I looked for a permissions issue and I think in our case the served cache
wasn't being written not due to a permissions issue, but due to this logic
<https://selenic.com/repo/hg/file/269c80ee5b3c/mercurial/branchmap.py#l96>
in branchmap.py which doesn't write a cache if it's produced from another
cache and there are no added revisions.


>
> > How plausible is the hypothesis that there's a race on "hg push"ing
> > that might cause a push to see an invalid cache?
>
> Could happen. Should be mostly harmless, especially with more recent hg
> where we have a secondary cache that makes rebuilding the main cache
> much faster.
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20150909/f58f4c77/attachment-0002.html>


More information about the Mercurial mailing list