[PATCH 1 of 3 RFC V2] scmutil: support background file closing
Yuya Nishihara
yuya at tcha.org
Tue Jan 12 13:32:47 UTC 2016
On Fri, 08 Jan 2016 10:39:22 -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1452278344 28800
> # Fri Jan 08 10:39:04 2016 -0800
> # Node ID e687c551d27e079e7146aaa2e56df0b07d4f9e47
> # Parent 667ae51eaa4cf74ab456565a57d32d434349a3f4
> scmutil: support background file closing
>
> Closing files that have been appended to is relatively slow on
> Windows/NTFS. This makes several Mercurial operations slower on
> Windows.
>
> The workaround to this issue is conceptually simple: use multiple
> threads for I/O. Unfortunately, Python doesn't scale well to multiple
> threads because of the GIL. And, refactoring our code to use threads
> everywhere would be a huge undertaking. So, we decide to tackle this
> problem by starting small: establishing a thread pool for closing
> files.
>
> This patch establishes a mechanism for closing file handles on separate
> threads. The coordinator object is basically a queue of file handles to
> operate on and a thread pool consuming from the queue.
>
> When files are opened through the VFS layer, the caller can specify
> that delay closing is allowed.
>
> A proxy class for file handles has been added. We must use a proxy
> because it isn't possible to modify __class__ on built-in types. This
> adds some overhead. But as future patches will show, this overhead
> is cancelled out by the benefit of closing file handles on background
> threads.
> +class delayclosedfile(object):
> + """Proxy for a file object whose close is delayed.
> +
> + Do not instantiate outside of the vfs layer.
> + """
> +
> + def __init__(self, fh, closer):
> + object.__setattr__(self, '_origfh', fh)
> + object.__setattr__(self, '_closer', closer)
> +
> + def __getattr__(self, attr):
> + return getattr(self._origfh, attr)
> +
> + def __setattr__(self, attr, value):
> + return setattr(self._origfh, attr, value)
> +
> + def __delattr__(self, attr):
> + return delattr(self._origfh, attr)
> +
> + def __enter__(self):
> + return self._origfh.__enter__()
> +
> + def __exit__(self, exc_type, exc_value, exc_tb):
> + self._closer.close(self._origfh)
I think fh.close() should be proxied to closer.close() as well.
> +class backgroundfilecloser(object):
> + """Coordinates background closing of file handles on multiple threads."""
> + def __init__(self, ui):
> + self._running = False
> + self._entered = False
> + self._threads = []
> +
> + # Only Windows/NTFS has slow file closing. So only enable by default
> + # on that platform. But allow to be enabled elsewhere for testing.
> + # Windows defaults to a limit of 512 open files. A buffer of 128
> + # should give us enough headway.
> + defaultenabled = os.name == 'nt'
> + enabled = ui.configbool('worker', 'backgroundfileclose', defaultenabled)
> +
> + if not enabled:
> + return
> +
> + maxqueue = ui.configint('worker', 'backgroundclosemaxqueue', 384)
> + threadcount = ui.configint('worker', 'backgroundclosethreads', 4)
> +
> + self._queue = Queue.Queue(maxsize=maxqueue)
> + self._running = True
> +
> + for i in range(threadcount):
> + t = threading.Thread(target=self._worker, name='backgroundcloser')
> + self._threads.append(t)
> + t.start()
> +
> + def __enter__(self):
> + self._entered = True
> + return self
> +
> + def __exit__(self, exc_type, exc_value, exc_tb):
> + self._running = False
Because threads aren't joined, delayed closing can exceed the "with" scope.
It would cause race condition and too early lock releasing.
> + def _worker(self):
> + """Main routine for worker thread."""
> + while True:
> + try:
> + fh = self._queue.get(block=True, timeout=0.100)
> + try:
> + fh.close()
> + except Exception:
> + # Need to catch or the thread will terminate and
> + # we could orphan file descriptors.
> + pass
Perhaps the first (or last?) exception should be propagated to the main
thread?
More information about the Mercurial-devel
mailing list