[RFC] Designing a store format that uses fewer files
Durham Goode
durham at fb.com
Mon Nov 30 23:53:34 UTC 2015
On 11/26/15 8:14 PM, Gregory Szorc wrote:
> Issue 4889 (creating/appending thousands of files is absurdly slow on
> NTFS) is kind of a big deal at Mozilla since we have a number of
> developers and automated processes running Windows. As I've documented
> in commit messages like
> https://selenic.com/pipermail/mercurial-devel/2015-September/073788.html,
> this NTFS behavior adds up to some operations taking several minutes
> longer on Windows than other platforms. Moving file closing to a
> background thread pool shows promise and I intend to get those patches
> landed for 3.7. However, the underlying performance issue stems from
> Mercurial's store model of 1 file per store path.
>
> While I feel like 1 file per store path is good enough for most
> people, cracks do form at scale *on all platforms*. So, I've been
> thinking of new store formats that rely on fewer files so filesystems
> won't interfere as much.
>
> I wrote up https://www.mercurial-scm.org/wiki/PackedRepoPlan
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_wiki_PackedRepoPlan&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=855yChf2NY_RgH-7UHySF1aAVtfIUZggwXnhOmODq90&s=jnpqAhPFwNod0dYRiWC8rn5TPXYu3GtZvbGOdSKML2I&e=>
> with the beginnings of a proposal.
>
> I was hoping to keep it simple. But somewhere along the way I realized
> it overlaps significantly with reader locks, better transactions,
> ending data duplication for copies and moves, and keeping the
> performance impact of obsolete/hidden changesets in check. I may have
> accidentally scope bloated to a grand unified store format. Oops.
>
> I'm really curious what others think about the proposal. If I break 1
> file per store path to make Windows scale, I introduce a new store
> requirement. And new store requirements mean you can break the world
> because there are no backwards compatibility concerns. So I think
> designing the new format to be compatible with all the other crazy
> stuff people have talked about is justified.
>
> I'm sure there are many flaws in my proposal. Perhaps some fatal ones.
> I'd love to hear what they are.
I read through the proposal. It's a bit large in scope to have a
coherent response to right now. I've been thinking about similar
things, though my thoughts mainly revolve around how to not require
entire revlogs.
If we introduce any GC like concept, I think we'd want to make sure it's
possible to incrementally GC (to avoid the git problem). For instance,
imagine that instead of having filelogs, you have 256 rev pack logs that
are where we append the new file revision (just shard file revisions to
the appropriate rev pack by name). If you're constantly appending to
the end of these files, you could do an incremental GC by only
reordering and compacting the top N revisions every time the rev pack
log grows by N revision. Then full filelog reads for a certain file
only require len(revpack)/N seeks. You could cheaply do full rev pack
GC's at a later time since each of the N-length sections are ordered
nicely internally.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20151130/754d5329/attachment-0002.html>
More information about the Mercurial-devel
mailing list