Bugzilla Extension custom regexp

Glfk4rt glfk4rt at gmail.com
Tue Oct 25 22:51:56 UTC 2011


Thank you for your reply.  Its been very helpful as I start my journey into
python. Now for my reply below.

On Tue, Oct 25, 2011 at 6:05 AM, Greg Ward <greg-hg at gerg.ca> wrote:

> On Tue, Oct 25, 2011 at 12:21 AM, Glfk4rt <glfk4rt at gmail.com> wrote:
> > Has anyone used the custom regexp option?  My attempts at adding :?
> failed
> > but also copying and pasting the default regular expression fails.
>
> Define "fails": explain what you are trying to do, and how your
> attempted solution doesn't cut it. I don't think the bugzilla
> extension is widely used, based on the amount of traffic it generates
> on this list (basically none). So it would be helpful to explain what
> "the custom regexp option" is.
>

Failed to make a match.  Why is mixed in below.
My goal was to allow for ":" after bug or bugs.  This would be two simply
characters added to the already existing regex, ":?".

This is from the source for he Bugzilla extension.  My parsing was that this
would replace the default regex, your reply below and the code tell another
story.

bugzilla.regexp
  Regular expression to match bug IDs in changeset commit message.
  Must contain one "()" group. The default expression matches ``Bug
  1234``, ``Bug no. 1234``, ``Bug number 1234``, ``Bugs 1234,5678``,
  ``Bug 1234 and 5678`` and variations thereof. Matching is case
  insensitive.



> > The
> > simplest of regex has also been tried and failed.  The changes failing, I
> am
> > happy to blame on my inexperience with Python.
>
> This is easily addressed in an interactive session:
>
>  $ python
>  [...]
>  >>> import re
>  >>> re.compile(r'foo').match('foo')          # returned something, which
>  <_sre.SRE_Match object at 0x7ff9477c9510>    # means it matched
>  >>> re.compile(r'foo').match('fooo')
>  <_sre.SRE_Match object at 0x7ff9477c9578>
>  >>> re.compile(r'foo').match('fo')           # no return value: not a
> match
>  >>> re.compile(r'fo+').match('fo')
>  <_sre.SRE_Match object at 0x7ff9477c95e0>
>  >>> re.compile(r'fo+').match('fooooo')
>  <_sre.SRE_Match object at 0x7ff9477c9578>
>
> Play around there until the little light bulb over your head switches on.
> ;-)
>
> Interactive session is fantastic.  While I cannot yet figure out how to
achieve my goal, this is certainly helpful. Even fun.



> Also keep in mind that it very much matters whether the bugzilla
> extension uses 'match()' or 'search()'. match() implicitly anchors
> your regex at start of string. Compare:
>
>  >>> re.compile(r'fo+').match('foo bar baz')        # matches
>  <_sre.SRE_Match object at 0x7ff9477c95e0>
>  >>> re.compile(r'fo+').match('ping foo bar baz')   # does not match
>  >>> re.compile(r'fo+').search('ping foo bar baz')  # matches
>  <_sre.SRE_Match object at 0x7ff9477c9578>
>
> If in doubt, assume match() and prefix your regex with .* .
>
> > Failure when using the
> > default regex puzzles me.  Does the Use below XOR the two regex or does
> it
> > join them?
> >
> > Or worse yet, am I missing a fundamental of python?
> >
> > In bugzilla.py
> > Default declaration
> >  _default_bug_re = (r'bugs?\s*,?\s*(?:#|nos?\.?|num(?:ber)?s?)?\s*'
> >                        r'((?:\d+\s*(?:,?\s*(?:and)?)?\s*)+)')
>
> Adjacent strings are simply concatenated. (This is one of the few odd
> conventions Python took from C.) E.g.
>
>  >>> r'foo' r'bar'
>  'foobar'
>
>  >>> re.compile(r'fo+' r'ba+').search('fooooobar')
>  <_sre.SRE_Match object at 0x7ff9477c95e0>
>  >>> re.compile(r'fo+' r'ba+').search('bar')      # no match
>  >>> re.compile(r'fo+' r'ba+').search('foo')      # no match
>
> Original
Without colon success
re.compile((r'bugs?\s*,?\s*(?:#|nos?\.?|num(?:ber)?s?)?\s*'
r'((?:\d+\s*(?:,?\s*(?:and)?)?\s*)+)')).search('bug 3333')
<_sre.SRE_Match object at 0x2b3e01b3aea0>
With failure
re.compile((r'bugs?\s*,?\s*(?:#|nos?\.?|num(?:ber)?s?)?\s*'
r'((?:\d+\s*(?:,?\s*(?:and)?)?\s*)+)')).search('bug: 3333')

Change I would love to make straight to the source
>>> re.compile((r'bugs?:?\s*,?\s*(?:#|nos?\.?|num(?:ber)?s?)?\s*'
r'((?:\d+\s*(?:,?\s*(?:and)?)?\s*)+)')).search('bug: 3333')
<_sre.SRE_Match object at 0x2b3e01b3ae48>
>>> re.compile((r'bugs?:?\s*,?\s*(?:#|nos?\.?|num(?:ber)?s?)?\s*'
r'((?:\d+\s*(?:,?\s*(?:and)?)?\s*)+)')).search('bugs 3333')
<_sre.SRE_Match object at 0x2b3e01b3ae48>

What I cannot figure out, is how to achieve the same with the concatenation
of regular expressions.  There is a mechanism related to "first to succeed"
that prevents my addition [That I do not fully understand]

>>> re.compile(r'bug' r'bug').search('bug')
>>> re.compile(r'bug' r'bug').search('bug')
>>> re.compile(r'bug' r'').search('bug')
<_sre.SRE_Match object at 0x2b3e05582120>

>>> re.compile(r'bug' r'').search('bug')
<_sre.SRE_Match object at 0x2b3e05582120>

As the bugzilla.regexp does not replace, but instead prepends, either a
syntax unknown to me must be used, or I should just alter the source.

Heck, perhaps even submit it as a patch.  Along with perhaps a clarification
to the documentation stating that the bugzilla.regexp is a prepend.


Thanks,
Rodney


> Regex alternates are specified the same as every other regex engine on
> the planet: | .
>
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20111025/e3d2498f/attachment-0002.html>


More information about the Mercurial mailing list