[PATCH] util: increase the copy buffer size in shutil's copyfile
Gregory Szorc
gregory.szorc at gmail.com
Fri May 6 22:26:57 UTC 2016
On Fri, May 6, 2016 at 2:11 PM, Tony Tung via Mercurial-devel <
mercurial-devel at mercurial-scm.org> wrote:
> > On May 6, 2016, at 1:50 PM, Augie Fackler <raf at durin42.com> wrote:
> >
> > On Fri, May 06, 2016 at 01:36:41PM -0700, Tony Tung wrote:
> >> # HG changeset patch
> >> # User Tony Tung <ttung at fb.com>
> >> # Date 1462566734 25200
> >> # Fri May 06 13:32:14 2016 -0700
> >> # Branch stable
> >> # Node ID edee9b1c3a2849fcb11900959ad785b202608c4f
> >> # Parent 97811ff7964710d32cae951df1da8019b46151a2
> >> util: increase the copy buffer size in shutil's copyfile
> >>
> >> The default copy buffer size in shutil is 16KB. This is far too
> >> conservative for modern systems. Use a 4MB buffer instead to copy files
> >> more efficiently.
> >>
> >> This manifests prominently with journaling large dirstates, which can be
> >> multi-second operations.
> >>
> >> diff --git a/mercurial/util.py b/mercurial/util.py
> >> --- a/mercurial/util.py
> >> +++ b/mercurial/util.py
> >> @@ -20,6 +20,7 @@
> >> import collections
> >> import datetime
> >> import errno
> >> +import functools
> >> import gc
> >> import hashlib
> >> import imp
> >> @@ -1010,6 +1011,23 @@
> >>
> >> return check
> >>
> >> +# The default copy buffer in python's shutil is 16KB. This is far too
> >> +# conservative in modern systems. Copy more at a time so we don't
> incur so much
> >> +# system call overhead.
> >> +def wrap(container, funcname, newfunc):
> >> + origfunc = getattr(container, funcname)
> >> + wrap = functools.partial(newfunc, origfunc)
> >> + functools.update_wrapper(wrap, origfunc)
> >> + setattr(container, funcname, wrap)
> >> +
> >> +def copyfileobj_lgbuffer(orig, *args, **kwargs):
> >> + if len(args) == 2 and 'length' not in kwargs:
> >> + args = (args[0], args[1], (4 * 1024 * 1024))
> >> +
> >> + orig(*args, **kwargs)
> >> +
> >> +wrap(shutil, 'copyfileobj', copyfileobj_lgbuffer)
> >
> > Why are you monkeypatching shutil's implementation here? I'm happy to
> > have a helper method in util, but I'm deeply skeptical of
> > monkeypatching the stdlib, doubly so when it's a method we don't
> > currently call outside of a test.
>
> copyfileobj is called by copyfile. copyfile does not expose the length
> parameter. Either we copypasta copyfile from shutil or we monkey patch
> copyfileobj’s default. I’m not particularly wedded to either approach as
> each has its own downside.
>
> Is the helper method you’re suggesting the copypaste approach?
>
The implementation is trivial and is simply basic POSIX I/O primitives:
https://hg.python.org/cpython/file/4462e193f089/Lib/shutil.py#l46
If we have our wits about us, we could implement shutil.copyfile() using
fewer system calls, especially on Windows, which has CopyFileEx() to
perform file copying in a single system call. Although I'm trying to think
if we do any hardcore file copying anywhere for this to matter performance
wise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160506/b7c86e7c/attachment-0002.html>
More information about the Mercurial-devel
mailing list