Windows test suite timeouts
Matt Harbison
matt_harbison at yahoo.com
Tue Jan 8 05:33:45 UTC 2013
Matt Mackall wrote:
> On Sat, 2013-01-05 at 21:35 -0500, Matt Harbison wrote:
>> Mads Kiilerich wrote:
>>> Dirkjan Ochtman wrote, On 01/05/2013 10:40 AM:
>>>> On Fri, Jan 4, 2013 at 4:06 AM, Matt Harbison
>>>> <matt_harbison at yahoo.com> wrote:
>>>>> Every so often, I've noticed that some tests on Windows will take a
>>>>> really
>>>>> long time, and then timeout:
>>>>>
>>>>> $ python run-tests.py -i test-largefiles.t
>>>>>
>>>>> ERROR: c:\Users\Matt\Projects\hg\tests\test-largefiles.t timed out
>>>>> t
>>>>> Failed test-largefiles.t: timed out
>>>>> # Ran 1 tests, 0 skipped, 1 failed.
>>>> Actually, on Gentoo, test-largefiles.t and test-mq.t have been timing
>>>> out for a bunch of users, so I'm guessing that test's problems aren't
>>>> only on Windows.
>>> These tests _are_ big and will reach the timeout limit on slow or
>>> overloaded hardware. The timeout had to be increased on some of the
>>> buildbots.
>>>
>>> In both cases: Are you sure the tests really are hanging, or are they
>>> just too slow?
>>>
>>> /Mads
>> It looks like that was the problem:
>>
>> $ time python run-tests.py -t 600 test-largefiles.t
>> .
>> # Ran 1 tests, 0 skipped, 0 failed.
>>
>> real 3m58.038s
>> user 0m0.000s
>> sys 0m0.062s
>>
>> I'm really surprised it is that high above the 3 minute default timeout,
>> since it works so consistently until it fails consistently.
>
> Perhaps you could time 20 runs in a row from a cold boot to look for a
> trend.
>
>> But this pretty much tracks with what I've seen- happens after a long
>> uptime, clears on reboot, and ran 3 times in a row with the raised
>> timeout (and both of your patches applied). I'm not sure why it works
>> in a Linux virtual box on the same machine, or why the full .t.err is
>> generated, but at least we got to the bottom of it.
>
> Have you looked at how long it takes to run this test on Linux?
>
> $ time ./run-tests.py -l test-largefiles.t
> .
> # Ran 1 tests, 0 skipped, 0 failed.
>
> real 0m32.142s
> user 0m19.401s
> sys 0m7.680s
>
> That's on a virtual machine on the same box that serves
> mercurial.selenic.com, which is under a steady load of mail and web
> traffic. It's running on a $400 machine I built in 2008:
>
> model name : Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
>
> That's the slowest thing I've got convenient access to. If I had my
> Raspberry Pi plugged in, it would probably also smoke Windows.
>
> People assume that Windows and OS X and Linux are roughly comparable in
> performance. It just ain't so, folks. Linux absolutely murders the other
> two on fork/exec, syscall, and filesystem lookup intensive benchmarks,
> which is what our test-suite amounts to.
>
For comparison purposes, I ran the tests 15 or so times on Windows and
on Linux prior to rebooting. Linux was tightly clustered (all were
greater than 58s and less than 60s, except 2 at 64s and one at 57s),
like so:
real 0m58.617s
user 0m42.814s
sys 0m12.179s
Windows was scattered between 3m21.422s and 3m28.857s, with an outlier
at 3m36.583s before the reboot. The only difference from the 3m58 above
is I killed Firefox before these tests, which was using 1Gb+ of memory
(though 4Gb of physical memory was still free).
After rebooting, Windows was quicker, but still all over the place (and
the first run _did_ timeout when I forgot to raise the value):
real 3m17.855s
real 3m7.154s
real 3m4.096s
real 3m8.339s
real 3m11.101s
real 3m5.750s
real 3m8.136s
real 3m17.137s
real 3m14.486s
real 3m17.247s
real 3m15.250s
real 3m8.901s
real 3m19.150s
real 3m14.126s
real 3m10.976s
real 3m12.504s
real 3m12.349s
real 3m10.663s
real 3m12.894s
real 3m13.784s
Linux after the reboot was slightly (~4s) quicker and just as tight
(greater than 54.7s and less than 55.8s). I then rebooted the Linux
virtualbox (usually I just save the state when quitting), and the time
fell more (greater than 41.2s and less than 42.3s, except two runs at
44s). This I think seems reasonable with your VM results, since I have
a Core i7 Q840 @ 1.67Ghz, and the difference from your results is ~10s.
So I'm not sure if there's any conclusion that can be drawn about the
test suite itself, other than maybe the default timeout should be raised
(maybe conditionally for Windows?) I know an environment variable can
be set to override, but others likely won't know about it or set it
until they run into this and spend time investigating.
--Matt
More information about the Mercurial-devel
mailing list