run-tests.py refactor

Gregory Szorc gregory.szorc at gmail.com
Wed May 14 00:15:20 UTC 2014


On 5/10/2014 12:14 PM, Gregory Szorc wrote:
> On 4/25/2014 9:30 PM, Yuya Nishihara wrote:
>> On Fri, 25 Apr 2014 14:46:01 -0700, Gregory Szorc wrote:
>>> On 4/25/2014 6:10 AM, Martin Geisler wrote:
>>>> Gregory Szorc <gregory.szorc at gmail.com> writes:
>>>>
>>>>> On my i7-2600K mech HD w/ -l -j8:
>>>>>
>>>>> Original:
>>>>>
>>>>> real    6m40.558s user    31m31.405s sys     17m48.862s
>>>>
>>>> I normally run with -j 20 and get a real time of 4m24s. I find that
>>>> there's so much IO going on that I can start that many processes in
>>>> parallel to really saturate the CPU (maybe -j 15 would be the
>>>> same).
>>>
>>> I created a tmpfs to isolate the impact of I/O from my mechanical HD.
>>> My 4+4 cores were largely saturated during -j8. There were only brief
>>> periods (maybe ~20s total) where idle CPU was below 5%. Raising it to
>>> -j10 made the lulls go away.
>>>
>>> Total execution time on the tmpfs was about the same as the mechanical
>>> HD. I can only conclude that the tests are CPU bottlenecked. My
>>> previous measurements indicate the bulk of that is inside `hg`.
>>>
>>> I may very well write a feature that invokes the Python profiler on
>>> all the `hg` invocations and sums the results. Could be interesting.
>>
>> IIRC, slow module lookup of Python contributes to the startup time of `hg`
>> process.  It might be partly an I/O bottleneck.
> 
> Further measurement indicates that spawning Python processes and
> importing the mercurial modules introduces significant overhead to
> running tests.
> 
> I ran `time python -c '1'` and `time python hg version` to measure the
> overhead of new Python processes and the most simple Mercurial command,
> respectively. I tested things on my desktop Sandy Bridge i7-2600K (few
> years old now) running Ubuntu 14.04 and a more modern 2013 Haswell MBP.
> I ran things several times and obtained the median. I didn't use an
> hgrc, so no extensions were loaded.
> 
> Machine   Distribution      Python Time     HG Time
> desktop   cpython 2.7.6         40ms          87ms
> desktop   cpython 3.3.5         60ms          N/A
> MBP       cpython 2.7.6         16ms          44ms
> MBP       cpython 3.3.5         27ms          N/A
> 
> (Note that 3.3.5 is slower than 2.7.6!)
> 
> We have nearly 13,000 `hg` invocations in our .t tests. If you
> extrapolate things out, the overhead for new Python processes is thus:
> 
> Machine   Distributions   Python     HG
> desktop   cpython 2.7.6     520s    1,131s
> desktop   cpython 3.3.5     780s     N/A
> MBP       cpython 2.7.6     208s     572s
> MBP       cpython 3.3.5     351s     N/A
> 
> We can then compare these numbers to the total Mercurial test suite
> execution time and see what the overhead of new Python processes and new
> HG processes are.
> 
> Machine  Test CPU Time   Python %    HG %
> desktop     2,960s         17.6      38.2
> MBP         1,855s         11.2      30.8
> 
> It certainly appears that cpython process overhead and the overhead of
> "initializing" the Mercurial environment within a fresh Python
> interpreter are contributing a significant amount of overhead to the
> Mercurial test suite.
> 
> Inlining tests into Python instead of shell like my patch series is
> attempting or using chg should significantly speed up the test suite.
> 
> Activating my regular hgrc file slowed down `hg version` by ~40ms on my
> desktop machine. That is statistically significant.
> 
> On my desktop machine, Perl 5.18.2 runs in 15ms, Ruby 1.9.3p484 in 27ms,
> PHP 5.5.9 in 50ms, and Git 1.8.1.2 runs in 12ms.
> 
> The Python new process overhead could very well be contributing to the
> perception that Mercurial is slow[er than Git]. You need <25ms latency
> for commands to feel instantaneous. We are already beyond that for the
> most trivial Mercurial commands.

I sent these observations/complaints to python-dev [1]. The IMO most
useful response so far is [2], with its reference to PEP 432 [3] and its
hopeful incorporation into Python 3.5.

python -S does drop process overhead significantly (by disabling site.py
processing). We could theoretically add that to hg's shebang. But we'd
likely have to reimplement a lot of site.py's functionality. It would
make a good experiment. chg is arguably a better solution.

[1] https://mail.python.org/pipermail/python-dev/2014-May/134528.html
[2] https://mail.python.org/pipermail/python-dev/2014-May/134566.html
[3] http://legacy.python.org/dev/peps/pep-0432/



More information about the Mercurial-devel mailing list