[PATCH 3 of 3 STABLE] check-code: detect incorrect format operation on the string not including "%"
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Sun Apr 13 13:35:45 UTC 2014
At Sun, 13 Apr 2014 22:32:19 +0900,
FUJIWARA Katsunori wrote:
>
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1397394948 -32400
> # Sun Apr 13 22:15:48 2014 +0900
> # Branch stable
> # Node ID 49c3352c68df5e29ad05db9d2087bc73e2d84372
> # Parent 98b776640d7939e3a8d6ac6601d5c7cd173c0539
> check-code: detect incorrect format operation on the string not including "%"
>
> This patch introduces new patterns to detect incorrect format
> operations on the string not including any "%" specifiers: it causes
> TypeError at runtime.
>
> Patterns newly introduced by this patch treat format operations in
> "format-string % value" style.
This patch may have to be posted for "default", because this
doesn't fix any existing problems.
But I chose to post this as one of series for "stable", because
preceding patch (#2) seems to make it easier to understand.
> This patch uses "[ xnopq]" instead of "[^%']" to match against
> 'characters other than "%" in string' correctly.
>
> For example, Python interpreter (and programmers) can recognize two
> strings 'preceding string' and '%s format' in the code below:
>
> func('preceding string', '%s format')
>
> But regexp matching may (mis-)recognize that string ', ' (not
> including any "%" specifiers) and formatting operation "%s" (from '%s
> format') on it in this code unexpectedly.
>
> After normalizing by "repquote()", strings in Python code doesn't
> contain characters other than ' ', 'x', 'n', 'o', 'p', 'q' or '%'. For
> example, the example code above is normalized like as below:
>
> func('xxxxxxxxx xxxxxx', '%x xxxxxx')
>
> Then , "'[ xnopq]+'" can avoid unexpected matching against ', ', but
> "'[^%']+'" still can't. This is because of using "[ xnopq]" instead of
> "[^%']" in this patch.
>
> "[^%']'" is simple enough for "(format-string) % value" style treated
> in preceding patch, because anchoring to "(" and ")" avoids unexpected
> matching described above.
>
> Newly added patterns mis-recognize that format operation '%x' is
> applied on string ' ' in the example code below (after normalizing by
> "repquote()"), because "[ xnopq]" matches against it: in the example
> above, comma between strings prevents from mis-recognizing.
>
> print 'xxxxxxxxx xxxxxx' '%x xxxxxx'
>
> Changing normalization result for ' ' in "repquote()" from ' ' to 'x'
> (or so) can resolve this problem, but it breaks examinations for rst
> syntax.
>
> This patch chooses to show hint information "(or use '+' for string
> concatenation to avoid ambiguous cases)", because this is very rare
> case and current Mercurial implementation doesn't have such code.
>
> diff --git a/contrib/check-code.py b/contrib/check-code.py
> --- a/contrib/check-code.py
> +++ b/contrib/check-code.py
> @@ -204,6 +204,12 @@
> "don't apply % on non-format string"),
> (r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
> "don't apply % on non-format string"),
> + (r'[^"\t ][ \t\n]*(?:"(?:[ xnopq]|%%)+"[ \t\n+]*)+%',
> + "don't apply % on non-format string\n" +
> + " (or use '+' for string concatenation to avoid ambiguous cases)"),
> + (r"[^'\t ][ \t\n]*(?:'(?:[ xnopq]|%%)+'[ \t\n+]*)+%",
> + "don't apply % on non-format string\n" +
> + " (or use '+' for string concatenation to avoid ambiguous cases)"),
> (r'(\w|\)),\w', "missing whitespace after ,"),
> (r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
> (r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
> --- a/tests/test-check-code.t
> +++ b/tests/test-check-code.t
> @@ -279,3 +279,40 @@
> > print ("%% doesn't work as percent") % v
> don't apply % on non-format string
> [1]
> +
> + $ cat > ./invalid-formatting2.py <<EOF
> + > print "no percent character" % v
> + > print "no " "percent " "character" % v
> + > print "no " + "percent " + "character" % v
> + >
> + > print "%% doesn't work as percent" % v
> + >
> + > print "%s " "ambiguous " "case" % v
> + > print "ambiguous case, %% and %s" % v
> + > print "ambiguous case (", " %Y-%M-%D"
> + >
> + > print " ambiguous case" " %Y-%M-%D"
> + > print "non-ambiguous case" + " %Y-%M-%D"
> + > EOF
> + $ "$check_code" ./invalid-formatting2.py
> + ./invalid-formatting2.py:1:
> + > print "no percent character" % v
> + don't apply % on non-format string
> + (or use '+' for string concatenation to avoid ambiguous cases)
> + ./invalid-formatting2.py:2:
> + > print "no " "percent " "character" % v
> + don't apply % on non-format string
> + (or use '+' for string concatenation to avoid ambiguous cases)
> + ./invalid-formatting2.py:3:
> + > print "no " + "percent " + "character" % v
> + don't apply % on non-format string
> + (or use '+' for string concatenation to avoid ambiguous cases)
> + ./invalid-formatting2.py:5:
> + > print "%% doesn't work as percent" % v
> + don't apply % on non-format string
> + (or use '+' for string concatenation to avoid ambiguous cases)
> + ./invalid-formatting2.py:11:
> + > print " ambiguous case" " %Y-%M-%D"
> + don't apply % on non-format string
> + (or use '+' for string concatenation to avoid ambiguous cases)
> + [1]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list