[PATCH] worker: ensure a posix worker exits with os._exit

Jun Wu quark at fb.com
Mon Aug 8 23:20:47 UTC 2016


# HG changeset patch
# User Jun Wu <quark at fb.com>
# Date 1470697491 -3600
#      Tue Aug 09 00:04:51 2016 +0100
# Node ID 6c1ce8f56180337cce7d67a77a71c41a4207b6b4
# Parent  3dbc95f3eb31874ab4633f936acff4609714dc41
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 6c1ce8f56180
worker: ensure a posix worker exits with os._exit

Like commandserver, the worker should never run other resource cleanup logic.

Previously this is not true for workers if they have exceptions other than
KeyboardInterrupt.

This actually causes a deadlock with remotefilelog in production:

1. remotefilelog/fileserverclient creates a sshpeer. pipei/o/e get created.
2. worker inherits that sshpeer's pipei/o/e.
3. worker runs sshpeer.cleanup (only happens without os._exit)
4. worker closes pipeo/i, which will normally make the sshpeer read EOF from
   its stdin and exit. But the master process still have pipeo, so no EOF.
5. worker reads pipee (stderr of sshpeer), which never completes because
   the ssh process does not exit, does not close its stderr.
6. master waits for all workers, which never completes because they never
   complete sshpeer.cleanup.

This could also be addressed by closing these fds after fork, which is not
easy because Python 2.x does not have an official "afterfork" hook. Hacking
os.fork is also ugly. Besides, sshpeer is probably not the only troublemarker.

The patch changes _posixworker so all its code paths will use os._exit to
avoid running unwanted resource clean-ups.

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -97,8 +97,11 @@ def _posixworker(ui, func, staticargs, a
                 os._exit(0)
             except KeyboardInterrupt:
                 os._exit(255)
-                # other exceptions are allowed to propagate, we rely
-                # on lock.py's pid checks to avoid release callbacks
+            except: # never return, hence no re-raises
+                try:
+                    ui.traceback()
+                finally:
+                    os._exit(255)
         pids.append(pid)
     pids.reverse()
     os.close(wfd)


More information about the Mercurial-devel mailing list