[PATCH 1 of 2 resend] keyword: compile regexes on demand
Martin Geisler
mg at aragost.com
Thu Nov 4 17:27:13 UTC 2010
Mads Kiilerich <mads at kiilerich.com> writes:
> On 11/04/2010 04:55 PM, Martin Geisler wrote:
>> Christian Ebert<blacktrash at gmx.net> writes:
>>
>>> # HG changeset patch
>>> # User Christian Ebert<blacktrash at gmx.net>
>>> # Date 1288791461 -3600
>>> # Node ID 2ce1ff53e29f4b775ed550c13beb42da3942523e
>>> # Parent 0e0a52bd58f941c00b2a1d57f23676fa486e58c3
>>> keyword: compile regexes on demand
>>
>> Are you sure this is faster? I tried to see how long the old code took
>> and here it's very fast:
>>
>> % python -m timeit \
>> -s "import re" \
>> -s "escaped = 'RCSfile|Author|Header|Source|Date|RCSFile|Id|Revision'" \
>> "kw = re.compile(r'\$(%s)\$' % escaped)" \
>> "kwexp = re.compile(r'\$(%s): [^$\n\r]*? \$' % escaped)"
>> 100000 loops, best of 3: 2.52 usec per loop
>
> Beware of the caching of compiled expressions inside the re module:
Ah, yeah, I always forget about that... it caches the last 100
expressions automatically.
> $ python -m timeit \
> > -s "import re" \
> > -s "escaped = 'RCS|Aut|Hea|Sou|Dat|RCS|Id|Rev'" \
> > "kw = re.compile(r'\$(%s)\$' % escaped)" \
> > "kwexp = re.compile(r'\$(%s): [^$\n\r]*? \$' % escaped)"
> 100000 loops, best of 3: 8.16 usec per loop
>
> $ python -m timeit \
> > -s "import re" \
> > -s "escaped = 'RCS|Aut|Hea|Sou|Dat|RCS|Id|Rev'" \
> > "re.purge()"
> 1000000 loops, best of 3: 1.17 usec per loop
>
> $ python -m timeit \
> > -s "import re" \
> > -s "escaped = 'RCS|Aut|Hea|Sou|Dat|RCS|Id|Rev'" \
> > "re.purge()" \
> > "kw = re.compile(r'\$(%s)\$' % escaped)" \
> > "kwexp = re.compile(r'\$(%s): [^$\n\r]*? \$' % escaped)"
> 1000 loops, best of 3: 1.93 msec per loop
>
> These numbers are so much higher that they might justify the change.
>
> I'm not sure if we should rely on the re cache or always should
> pre-compile everywhere, but unnecessary compilation is unfortunate. A
> new general util function or pattern could perhaps be nice.
Not for this case -- the keyword extension only does the compilation
once in any case.
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://aragost.com/mercurial/
More information about the Mercurial-devel
mailing list