[RFC] Interaction between strip and caches
Joerg Sonnenberger
joerg at bec.de
Sat Feb 27 23:13:02 UTC 2021
On Fri, Feb 26, 2021 at 10:52:52PM -0500, Augie Fackler wrote:
>
>
> > On Dec 14, 2020, at 5:03 PM, Joerg Sonnenberger <joerg at bec.de> wrote:
> >
> > Hello all,
> > while looking at the revbranchcache, I noticed that it is doing quite an
> > expensive probalistic invalidation dance. It is essentially looking up
> > the revision in the changelog again and compares the first 32bit to see
> > if they (still) match. Other caches are doing cheaper checks like
> > remembering the head revision and node and checking it again to match.
> > The goal is in all cases to detect one of two cases:
> >
> > (1) Repository additions by a hg instance without support for the cache.
> > (2) Repository removals by strip without update support specific to the
> > cache in use.
> >
> > The first part is generally handled reasonable well and cheap. Keep
> > track of the number of revisions and process to all missing changesets
> > is something code has to support anyway. The real difficult problem is
> > the second part. I would like us to adopt a more explicit way of dealing
> > with this and opt-in support via a repository requirement. Given that
> > the strip command has finally become part of core, it looks like a good
> > time to do this now.
> >
> > The first option is to require strip to nuke all caches that it can't
> > update. This is easy to implement and works reliable by nature with all
> > existing caches. It is also the more blunt option.
>
> Won’t the caches invalidate themselves an this defect happens today?
Only if the cache implementation hooks into strip and is active at the
time. As mentioned at the start, it is expensive and complex. I'd say
80% of the complexity of the new .hgtags cache version I am working on
is dealing with the current cache invalidation.
> > The second option is to keep a journal of strips. This can be a single
> > monotonically increasing counter and every cache just reads the counter
> > and rebuilds itself. Alternatively it could be a full journal that lists
> > the revisions and associated nodes removed. This requires changes to
> > existing caches but has the advantage that strip can be replayed by the
> > cache logic to avoid a full rebuild.
>
> Potentially complicated, but could be worthwhile in a large repo with
> strips. Is that something you expect to encounter? For the most part
> we’ve historically considered strip an anti-pattern of sorts and not
> worried super hard about optimizing it.
My hope is that if we can handle additions by non-cache-aware clients as
we do now, it is good enough. Replaying changes is moderately cheap if
we don't have to deal with strip.
There is also the related issue of cache invalidation for obsstore, but
the same concerns apply -- replaying changes is easy as long as we don't
have to handle removal of entries.
Joerg
More information about the Mercurial-devel
mailing list