HgWeb.cgi Hanging During Push

Matt Mackall mpm at selenic.com
Thu Apr 19 21:34:54 UTC 2012


On Thu, 2012-04-19 at 15:05 -0400, Jensen, Aaron wrote:
> We're running Mercurial 2.1.1 under IIS using CGI on Windows 2008.  We
> have two server-side hooks (written in PowerShell) that run on
> pretxnchangegroup.  One checks for case collisions if a file was added
> in any of the incoming changesets, the other checks for merge
> direction when pushing merges (we have some internal rules about what
> branches can be merged with other branches).  These hooks can take up
> to a couple minutes for pushes with a lot of changesets.  Our
> repository has over 70k files, and is 17+ GB in size.
> 
> We're noticing that if developer #2 pushes while developer #1 is
> pushing (his python.exe CGI process has locked the repo and our hooks
> are running), as expected, developer #2's CGI process sits and waits
> for developer #1's push to finish.  However, once developer #1's push
> succeeds, developer #2's CGI process doesn't detect that the repo is
> available/unlocked, and never locks the repo or runs any hooks.  It
> just hangs, using no CPU or increasing in memory.

We would actually expect the second push to fail. The push contains the
set of heads before the push starts spooling and if this doesn't match
the set after, we assume a race occurred.

> I would expect that developer #2 would get a message about "waiting
> for lock" message, but the last message Mercurial outputs is
> "searching for changes".  Hitting CTRL+C doesn't stop the push.
> Developer #2 has to kill hg.exe, or I have to log into our Mercurial
> server and kill developer #2's CGI process.  No repository corruption
> occurs on either the client or the server.

The ctrl-c thing is odd. Probably some quirk of how "signal handling"
interacts with blocked sockets on Windows. Windows launches a separate
thread to handle the signal, so handling in the main thread may be
deferred until the read()-that-never-finishes finishes. Or something.

It's actually good to know that this is happening: it might explain why
some people are force-killing Mercurial on Windows and thereby
interrupting transactions.

> How can I go about debugging this problem?  Does it look familiar to anyone?

You probably want to add some instrumentation in mercurial/wireproto.py
around here:

http://www.selenic.com/hg/file/09dd707b522a/mercurial/wireproto.py#l574

Sprinkle around some lines like:

 sys.stderr.write("%d: at step X\n" % os.getpid())

I'd also recommend trying to create a simpler/faster test case, possibly
by making a hook that just sleeps for 10 seconds.

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial mailing list