New revlog format, plan page
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Wed Jan 13 14:23:33 UTC 2021
On 1/11/21 4:14 PM, Joerg Sonnenberger wrote:
> On Mon, Jan 11, 2021 at 01:12:30PM +0100, Pierre-Yves David wrote:
>> (1) Some of the current cache we have would fit well in such index
>> * The hgtagsfnodes cache: taking 4 bytes to cache the `.hgtags` revision
>> number associated with a changelog revisions. (This will requires some
>> bookkeeping while adding/stripping),
>> * the `rbc-revs-v1`: using an integer (4bytes) and an external list to store
>> the branch on which each revision is,
>> * (probably another 4 bytes to store the sub-branch/topic,)
>
> I'd be reluctant to move them into the revlog. If anything, it would
> call for a more variant friendly format specification.
I not sure what you mean with "a more variant friendly format specification"
> Ultimately, we
> should figure out first how "hot" the various caches are before dedicing
> to tie them tighter to changelog.
The branch cache pretty hot as we often requires this information,
including to warm other cache and data. The tags one is not accessed
that often, but the cache is very important for performance on large
repository so we will keep needing it. Having in the changelog index
makes its lifetime much simpler.
> Also, at the very least in the case of
> rbc-revs-v1, it would also prevent some useful optimisations.
This is not different from the current situation, except we no longer
have to deal with a different file with non-trivial cache validation and
life time. If we want to speed up the "which revision are in this
branch" question, we will need some other index anyway, and that can
come later.
> When we
> sort out the cache invalidation story, having a strict linear mapping of
> 32bit entries would make queries for all revisions of a given branch
> easier than if it is part of a more complex data structure.
The data remains linear, if just have extra, fixed size data inbetween.
This should not be a problem, should it ?
>> (2) Some cache key mechanism. Right now a lot of cache validate their
>> content using a (tip-rev, tip-node) pair. That pair is fragile as it does
>> not garantee that the content before the tip is the same. Having "some"
>> bytes that gather some kind of accumulated value from the previously added
>> nodes. It does not have to be too many bytes, as the (tip-node, tip-rev,
>> cache-key) should be good enough. We can probably build it using a series of
>> shift and xor of the hash we are adding.
>
> See my mail from Dec, 14th. Having done a few more things in the mean
> time, I'd add phases and obslog as cache keys on top and that's
> something we don't handle well right now at all. At that point the
> current invalidation strategy just becomes way too fragile.
I can't find said message, do you have a link ?
Here, I am only talking about cache content for the the changelog only.
I think we both agree that for content that depends on other stuff, key
for these other stuff need to be put to use.
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list