ideas: chg repo preloading, and new changelog index
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Wed Jan 11 09:06:35 UTC 2017
On 12/30/2016 09:47 PM, Sean Farley wrote:
> Jun Wu <quark at fb.com> writes:
>
>> chg repo preloading
>>
>> I have been thinking about speeding up repo loading for a long time.
>> Previous ideas are persistent radix tree, hidden bitmap, mmap changelog.i.
>>
>> Recently I realized that chg (after the uisetup refactoring) could be an
>> option, assuming users use read commands more frequently than writes.
>>
>> The idea is simple, the master server (the process before fork) maintains
>> a map {repo_path: {index_hash: index, marker_hash: markers, ...}}, where
>> *_hash is a quick hash of sensitive properties like sensitive file sizes,
>> etc. to decide whether the value can be used. The forked worker gets the
>> map for free and uses it to quickly construct the repo object if the hash
>> matches.
>>
>> The master server needs a background thread doing the preloading. So it's
>> no longer stateless. Hopefully it's fine because all the preloading stuffs
>> are low-level, self-contained and not affected by extensions.
>>
>> However, if an extension does change the behavior of something being
>> cached here, we will have compatibility issues. It's solvable if chg has
>> APIs for 3rd-party extensions to just drop some kind of cache.
>>
>> 3rd-party repo requirements can also be troublesome for things that
>> require a repo object to calculate, namely obsolete._compute*set. While
>> changelog.index, obsstore._readmarkers could be calculated without repo.
>>
>> Therefore I think it's still a good idea to cache those low-level stuffs
>> without a repo object.
>>
>> If this direction looks promising, I will try to start with caching the C
>> index object first. Then we can think about how to deal with the obsstore.
>>
>> new changelog index
>>
>> (note: this is less related to chg, but fits nicely with the plan above)
>>
>> I personally like to see an efficient changelog "index" object whose code
>> is immutable to extensions (i.e. extensions could not change the logic
>> inside it), reusable outside the Python eco-system (likely implemented in
>> C without Python.h or Rust), taking a minimal set of inputs (changelog.i,
>> phaseroots, obsstore, but allows customized parsers), and deals with the
>> following independently (could be implemented incrementally):
>>
>> - converting between rev number, node (and partialmatch)
>> - calculate common ancestors
>> - revset bitmap representation: native ancestors / descendants
>> construction, support and/or/minus operations
>> - understand phases
>> - understand obsolete concepts
>>
>> If that looks promising, I'll try to work on it after the above chg change.
>
> Now, this is something I really like (the preloading stuff looks good
> too but wanted to chime in on this part first).
>
> First of all, I completely agree with not including Python.h (nor Rust,
> though I can bend on that if needed). Not every server (or client) wants
> to spin up python (nor pypy) and I've been thinking of writing what
> you've proposed for some time now. My dream is to have this small C
> library used by core Mercurial so that there is still one canonical
> implementation and other clients / servers can link to that for whatever
> purpose.
I'm +1 on what Sean says here. Keeping a single reference implementation
is important and having standard Mercurial taking advantage of these
speed up seems valuable.
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list