ideas: chg repo preloading, and new changelog index

Pierre-Yves David pierre-yves.david at ens-lyon.org
Wed Jan 11 09:06:35 UTC 2017



On 12/30/2016 09:47 PM, Sean Farley wrote:
> Jun Wu <quark at fb.com> writes:
>
>> chg repo preloading
>>
>>   I have been thinking about speeding up repo loading for a long time.
>>   Previous ideas are persistent radix tree, hidden bitmap, mmap changelog.i.
>>
>>   Recently I realized that chg (after the uisetup refactoring) could be an
>>   option, assuming users use read commands more frequently than writes.
>>
>>   The idea is simple, the master server (the process before fork) maintains
>>   a map {repo_path: {index_hash: index, marker_hash: markers, ...}}, where
>>   *_hash is a quick hash of sensitive properties like sensitive file sizes,
>>   etc. to decide whether the value can be used. The forked worker gets the
>>   map for free and uses it to quickly construct the repo object if the hash
>>   matches.
>>
>>   The master server needs a background thread doing the preloading. So it's
>>   no longer stateless. Hopefully it's fine because all the preloading stuffs
>>   are low-level, self-contained and not affected by extensions.
>>
>>   However, if an extension does change the behavior of something being
>>   cached here, we will have compatibility issues. It's solvable if chg has
>>   APIs for 3rd-party extensions to just drop some kind of cache.
>>
>>   3rd-party repo requirements can also be troublesome for things that
>>   require a repo object to calculate, namely obsolete._compute*set. While
>>   changelog.index, obsstore._readmarkers could be calculated without repo.
>>
>>   Therefore I think it's still a good idea to cache those low-level stuffs
>>   without a repo object.
>>
>>   If this direction looks promising, I will try to start with caching the C
>>   index object first. Then we can think about how to deal with the obsstore.
>>
>> new changelog index
>>
>>   (note: this is less related to chg, but fits nicely with the plan above)
>>
>>   I personally like to see an efficient changelog "index" object whose code
>>   is immutable to extensions (i.e. extensions could not change the logic
>>   inside it), reusable outside the Python eco-system (likely implemented in
>>   C without Python.h or Rust), taking a minimal set of inputs (changelog.i,
>>   phaseroots, obsstore, but allows customized parsers), and deals with the
>>   following independently (could be implemented incrementally):
>>
>>     - converting between rev number, node (and partialmatch)
>>     - calculate common ancestors
>>     - revset bitmap representation: native ancestors / descendants
>>       construction, support and/or/minus operations
>>     - understand phases
>>     - understand obsolete concepts
>>
>>   If that looks promising, I'll try to work on it after the above chg change.
>
> Now, this is something I really like (the preloading stuff looks good
> too but wanted to chime in on this part first).
>
> First of all, I completely agree with not including Python.h (nor Rust,
> though I can bend on that if needed). Not every server (or client) wants
> to spin up python (nor pypy) and I've been thinking of writing what
> you've proposed for some time now. My dream is to have this small C
> library used by core Mercurial so that there is still one canonical
> implementation and other clients / servers can link to that for whatever
> purpose.

I'm +1 on what Sean says here. Keeping a single reference implementation 
is important and having standard Mercurial taking advantage of these 
speed up seems valuable.

-- 
Pierre-Yves David



More information about the Mercurial-devel mailing list