[PATCH] util: increase the copy buffer size in shutil's copyfile

Gregory Szorc gregory.szorc at gmail.com
Fri May 6 22:26:57 UTC 2016


On Fri, May 6, 2016 at 2:11 PM, Tony Tung via Mercurial-devel <
mercurial-devel at mercurial-scm.org> wrote:

> > On May 6, 2016, at 1:50 PM, Augie Fackler <raf at durin42.com> wrote:
> >
> > On Fri, May 06, 2016 at 01:36:41PM -0700, Tony Tung wrote:
> >> # HG changeset patch
> >> # User Tony Tung <ttung at fb.com>
> >> # Date 1462566734 25200
> >> #      Fri May 06 13:32:14 2016 -0700
> >> # Branch stable
> >> # Node ID edee9b1c3a2849fcb11900959ad785b202608c4f
> >> # Parent  97811ff7964710d32cae951df1da8019b46151a2
> >> util: increase the copy buffer size in shutil's copyfile
> >>
> >> The default copy buffer size in shutil is 16KB.  This is far too
> >> conservative for modern systems.  Use a 4MB buffer instead to copy files
> >> more efficiently.
> >>
> >> This manifests prominently with journaling large dirstates, which can be
> >> multi-second operations.
> >>
> >> diff --git a/mercurial/util.py b/mercurial/util.py
> >> --- a/mercurial/util.py
> >> +++ b/mercurial/util.py
> >> @@ -20,6 +20,7 @@
> >> import collections
> >> import datetime
> >> import errno
> >> +import functools
> >> import gc
> >> import hashlib
> >> import imp
> >> @@ -1010,6 +1011,23 @@
> >>
> >>     return check
> >>
> >> +# The default copy buffer in python's shutil is 16KB.  This is far too
> >> +# conservative in modern systems.  Copy more at a time so we don't
> incur so much
> >> +# system call overhead.
> >> +def wrap(container, funcname, newfunc):
> >> +    origfunc = getattr(container, funcname)
> >> +    wrap = functools.partial(newfunc, origfunc)
> >> +    functools.update_wrapper(wrap, origfunc)
> >> +    setattr(container, funcname, wrap)
> >> +
> >> +def copyfileobj_lgbuffer(orig, *args, **kwargs):
> >> +    if len(args) == 2 and 'length' not in kwargs:
> >> +        args = (args[0], args[1], (4 * 1024 * 1024))
> >> +
> >> +    orig(*args, **kwargs)
> >> +
> >> +wrap(shutil, 'copyfileobj', copyfileobj_lgbuffer)
> >
> > Why are you monkeypatching shutil's implementation here? I'm happy to
> > have a helper method in util, but I'm deeply skeptical of
> > monkeypatching the stdlib, doubly so when it's a method we don't
> > currently call outside of a test.
>
> copyfileobj is called by copyfile.  copyfile does not expose the length
> parameter.  Either we copypasta copyfile from shutil or we monkey patch
> copyfileobj’s default.  I’m not particularly wedded to either approach as
> each has its own downside.
>
> Is the helper method you’re suggesting the copypaste approach?
>

The implementation is trivial and is simply basic POSIX I/O primitives:
https://hg.python.org/cpython/file/4462e193f089/Lib/shutil.py#l46

If we have our wits about us, we could implement shutil.copyfile() using
fewer system calls, especially on Windows, which has CopyFileEx() to
perform file copying in a single system call. Although I'm trying to think
if we do any hardcore file copying anywhere for this to matter performance
wise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160506/b7c86e7c/attachment-0002.html>


More information about the Mercurial-devel mailing list