RFC: bitmap storage for hidden
Durham Goode
durham at fb.com
Wed Aug 31 20:53:17 UTC 2016
One of the performance costs that affects every command is the
computehidden function (which computes what commits are hidden based on
a combination of obsmarkers, bookmarks, workingcopy, phases, and tags).
At Facebook this function alone can add 300ms to every command a user
runs (depending on a variety of factors).
I've begun work on a storage structure for this information, so we don't
need to recompute it during every read command. Before I go through the
work to polish it enough to send it upstream, I want to run it by people
for comments.
The idea is basically:
- Have a file that acts as a bitmap, where a changelog rev number maps
to a particular bit, and if that bit is set then the node is considered
hidden. computehidden will return a class that uses this to answer
filteredrevs.__contains__ calls.
- When something is changed that would affect visibility (bookmark edit,
phase change, obsmarker creation, dirstate movement, tag edit), that
code is responsible for calling hidden.updatevisibilty(repo,
affectednodes) which reports which nodes are touched by the edit (like
the old bookmark node and the new bookmark node), and updatevisibility()
does the minimal calculation of visibility changes and propagates those
changes to the node ancestors as required.
- The file would be stored in .hg/cache/hidden, and would only be 122kb
per million commits, so we'd just use the normal copy + atomic rename
strategy for transactionality (i.e. transaction.addfilegenerator)
- In theory the file would always be up-to-date, but we can put some
metadata at the front of the file to allow verifying the repo hasn't
changed significantly out from under it (recording tip, working copy
parents, obsmarker file timestamp, etc). If the repo has changed, or if
the bitmap doesn't yet exist, it can be trivially calculated using
hidden.updatevisibility(repo.revs('not public()')) in a manner similar
to how it works today.
Notable issues:
- A number of places treat filteredrevs as a set, and do things like
'myset - repo.filteredrevs'. With a bitmap this doesn't work, so we
need to translate it to '(r for r in myset if r not in
repo.filteredrevs)'. Which is probably a better O() anyway since it
won't be affected by the size of filteredrevs.
- filteredrevs is currently a frozen set. Editing the bitmap will have
to be done in a way where the edits only become visible when the
'frozen' set is invalidated. So we maintain the existing behavior.
Follow up ideas:
- Once this ships as a cache, it might be possible to progress it to
not-a-cache, such that we could define visibility in other terms. For
instance, we could allow hiding/reviving commits without obsolescence
markers in non-evolve repos. This is a bit more of a long term thought,
but might be worth brainstorming about.
My code so far can be seen at:
https://bitbucket.org/DurhamG/hg/commits/branch/bitmap
Though it's currently missing some key features (cache validity
checking, tests don't all pass yet, file is in .hg/store instead of
.hg/cache).
More information about the Mercurial-devel
mailing list