[PATCH 1 of 3 RFC V2] scmutil: support background file closing

Yuya Nishihara yuya at tcha.org
Tue Jan 12 13:32:47 UTC 2016


On Fri, 08 Jan 2016 10:39:22 -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1452278344 28800
> #      Fri Jan 08 10:39:04 2016 -0800
> # Node ID e687c551d27e079e7146aaa2e56df0b07d4f9e47
> # Parent  667ae51eaa4cf74ab456565a57d32d434349a3f4
> scmutil: support background file closing
> 
> Closing files that have been appended to is relatively slow on
> Windows/NTFS. This makes several Mercurial operations slower on
> Windows.
> 
> The workaround to this issue is conceptually simple: use multiple
> threads for I/O. Unfortunately, Python doesn't scale well to multiple
> threads because of the GIL. And, refactoring our code to use threads
> everywhere would be a huge undertaking. So, we decide to tackle this
> problem by starting small: establishing a thread pool for closing
> files.
> 
> This patch establishes a mechanism for closing file handles on separate
> threads. The coordinator object is basically a queue of file handles to
> operate on and a thread pool consuming from the queue.
> 
> When files are opened through the VFS layer, the caller can specify
> that delay closing is allowed.
> 
> A proxy class for file handles has been added. We must use a proxy
> because it isn't possible to modify __class__ on built-in types. This
> adds some overhead. But as future patches will show, this overhead
> is cancelled out by the benefit of closing file handles on background
> threads.

> +class delayclosedfile(object):
> +    """Proxy for a file object whose close is delayed.
> +
> +    Do not instantiate outside of the vfs layer.
> +    """
> +
> +    def __init__(self, fh, closer):
> +        object.__setattr__(self, '_origfh', fh)
> +        object.__setattr__(self, '_closer', closer)
> +
> +    def __getattr__(self, attr):
> +        return getattr(self._origfh, attr)
> +
> +    def __setattr__(self, attr, value):
> +        return setattr(self._origfh, attr, value)
> +
> +    def __delattr__(self, attr):
> +        return delattr(self._origfh, attr)
> +
> +    def __enter__(self):
> +        return self._origfh.__enter__()
> +
> +    def __exit__(self, exc_type, exc_value, exc_tb):
> +        self._closer.close(self._origfh)

I think fh.close() should be proxied to closer.close() as well.

> +class backgroundfilecloser(object):
> +    """Coordinates background closing of file handles on multiple threads."""
> +    def __init__(self, ui):
> +        self._running = False
> +        self._entered = False
> +        self._threads = []
> +
> +        # Only Windows/NTFS has slow file closing. So only enable by default
> +        # on that platform. But allow to be enabled elsewhere for testing.
> +        # Windows defaults to a limit of 512 open files. A buffer of 128
> +        # should give us enough headway.
> +        defaultenabled = os.name == 'nt'
> +        enabled = ui.configbool('worker', 'backgroundfileclose', defaultenabled)
> +
> +        if not enabled:
> +            return
> +
> +        maxqueue = ui.configint('worker', 'backgroundclosemaxqueue', 384)
> +        threadcount = ui.configint('worker', 'backgroundclosethreads', 4)
> +
> +        self._queue = Queue.Queue(maxsize=maxqueue)
> +        self._running = True
> +
> +        for i in range(threadcount):
> +            t = threading.Thread(target=self._worker, name='backgroundcloser')
> +            self._threads.append(t)
> +            t.start()
> +
> +    def __enter__(self):
> +        self._entered = True
> +        return self
> +
> +    def __exit__(self, exc_type, exc_value, exc_tb):
> +        self._running = False

Because threads aren't joined, delayed closing can exceed the "with" scope.
It would cause race condition and too early lock releasing.

> +    def _worker(self):
> +        """Main routine for worker thread."""
> +        while True:
> +            try:
> +                fh = self._queue.get(block=True, timeout=0.100)
> +                try:
> +                    fh.close()
> +                except Exception:
> +                    # Need to catch or the thread will terminate and
> +                    # we could orphan file descriptors.
> +                    pass

Perhaps the first (or last?) exception should be propagated to the main
thread?



More information about the Mercurial-devel mailing list