[PATCH 4 of 5] changegroup: use filecloser for closing filelog handles
Gregory Szorc
gregory.szorc at gmail.com
Wed Sep 30 16:09:39 UTC 2015
On Wed, Sep 30, 2015 at 8:50 AM, Augie Fackler <raf at durin42.com> wrote:
> On Tue, Sep 29, 2015 at 10:36:30PM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1443552606 25200
> > # Tue Sep 29 11:50:06 2015 -0700
> > # Node ID a8d8f0d70593aaa6fb2b86dc77d29c021d83277d
> > # Parent 4ade2d6126d9b0f67ea898ab0d021abb88d157dd
> > changegroup: use filecloser for closing filelog handles
>
> Overall, I like where this is headed, but I do wonder if this should
> be part of our vfs layer rather than its own thing - that would also
> give us a way to ensure close() operations finish before a file is
> subsequently reopened (eg if an inline revlog switches to two-file
> during a pull). What do you think?
>
The code purposefully only applies the delayed closing behavior on the
final handles used by revlog.addgroup(), so this won't impact inline to 2
file switch. (Future optimization: revlogs advertise their size inside the
changegroup so we don't waste time doing the 2 file switch.)
However, the code does assume that a revlog is only listed once per
changegroup/bundle. If that's ever not true, we do have a race between
closing and writing.
Moving to the VFS layer is tempting. While I was implementing this, I was
thinking that this work could be the beginning of a generic asynchronous
and multi-threaded I/O facility. That would almost certainly be in the VFS
layer.
Adding to the VFS layer does mean a little extra state tracking. There are
also questions such as "how do we guarantee these open files are closed [so
we can finalize the transition]?"
Let me think about it some more.
>
> >
> > On a large repository like mozilla-central, writing filelogs during
> > changegroup application during a clone opens (and closes) over 200,000
> > files. On Windows, file closes are slow and block the main thread while
> > they are closing.
> >
> > Moving filelog file closes to a background thread on Windows frees up
> > the main thread to process the next filelog before waiting for the close
> > to complete. The end result is faster filelog application.
> >
> > On my Windows 10 machine (not a VM) and with a modern SSD, this patch
> > has a drastic impact on unbundling a snapshot of mozilla-central:
> >
> > Before: ~910s
> > After: ~609s
> >
> > That's a 301s or 5:01 wall time reduction!
> >
> > And I'm convinced this isn't the upper limit of Windows performance: I
> > made some local changes to measure time spent waiting for the file
> > closer thread's queue to unblock and it said we still spend ~34s of main
> > thread time waiting on file closes. I also haven't really experimented
> > with the various wait intervals and limits in the file closer patch. But
> > since we're already looking at a 5 minute performance win, I'm content
> > with where things are.
> >
> > diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> > --- a/mercurial/changegroup.py
> > +++ b/mercurial/changegroup.py
> > @@ -657,12 +657,14 @@ def changegroup(repo, basenodes, source)
> > # to avoid a race we use changegroupsubset() (issue1320)
> > return changegroupsubset(repo, basenodes, repo.heads(), source)
> >
> > def addchangegroupfiles(repo, source, revmap, trp, pr, needfiles):
> > - if True:
> > + closer = util.filecloser(repo.ui)
> > + try:
> > revisions = 0
> > files = 0
> > while True:
> > + closer.verifystate()
> > chunkdata = source.filelogheader()
> > if not chunkdata:
> > break
> > f = chunkdata["filename"]
> > @@ -670,9 +672,9 @@ def addchangegroupfiles(repo, source, re
> > pr()
> > fl = repo.file(f)
> > o = len(fl)
> > try:
> > - if not fl.addgroup(source, revmap, trp):
> > + if not fl.addgroup(source, revmap, trp,
> closecb=closer.close):
> > raise util.Abort(_("received file revlog group is
> empty"))
> > except error.CensoredBaseError as e:
> > raise util.Abort(_("received delta base is censored:
> %s") % e)
> > revisions += len(fl) - o
> > @@ -687,8 +689,10 @@ def addchangegroupfiles(repo, source, re
> > raise util.Abort(
> > _("received spurious file revlog entry"))
> > if not needs:
> > del needfiles[f]
> > + finally:
> > + closer.cleanup()
> > repo.ui.progress(_('files'), None)
> >
> > for f, needs in needfiles.iteritems():
> > fl = repo.file(f)
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20150930/bc80b05f/attachment-0002.html>
More information about the Mercurial-devel
mailing list