[PATCH 3 of 3 STABLE] check-code: detect incorrect format operation on the string not including "%"

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Sun Apr 13 13:35:45 UTC 2014


At Sun, 13 Apr 2014 22:32:19 +0900,
FUJIWARA Katsunori wrote:
> 
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1397394948 -32400
> #      Sun Apr 13 22:15:48 2014 +0900
> # Branch stable
> # Node ID 49c3352c68df5e29ad05db9d2087bc73e2d84372
> # Parent  98b776640d7939e3a8d6ac6601d5c7cd173c0539
> check-code: detect incorrect format operation on the string not including "%"
> 
> This patch introduces new patterns to detect incorrect format
> operations on the string not including any "%" specifiers: it causes
> TypeError at runtime.
> 
> Patterns newly introduced by this patch treat format operations in
> "format-string % value" style.

This patch may have to be posted for "default", because this
doesn't fix any existing problems.

But I chose to post this as one of series for "stable", because
preceding patch (#2) seems to make it easier to understand.




> This patch uses "[ xnopq]" instead of "[^%']" to match against
> 'characters other than "%" in string' correctly.
> 
> For example, Python interpreter (and programmers) can recognize two
> strings 'preceding string' and '%s format' in the code below:
> 
>     func('preceding string', '%s format')
> 
> But regexp matching may (mis-)recognize that string ', ' (not
> including any "%" specifiers) and formatting operation "%s" (from '%s
> format') on it in this code unexpectedly.
> 
> After normalizing by "repquote()", strings in Python code doesn't
> contain characters other than ' ', 'x', 'n', 'o', 'p', 'q' or '%'. For
> example, the example code above is normalized like as below:
> 
>     func('xxxxxxxxx xxxxxx', '%x xxxxxx')
> 
> Then , "'[ xnopq]+'" can avoid unexpected matching against ', ', but
> "'[^%']+'" still can't. This is because of using "[ xnopq]" instead of
> "[^%']" in this patch.
> 
> "[^%']'" is simple enough for "(format-string) % value" style treated
> in preceding patch, because anchoring to "(" and ")" avoids unexpected
> matching described above.
> 
> Newly added patterns mis-recognize that format operation '%x' is
> applied on string ' ' in the example code below (after normalizing by
> "repquote()"), because "[ xnopq]" matches against it: in the example
> above, comma between strings prevents from mis-recognizing.
> 
>     print 'xxxxxxxxx xxxxxx' '%x xxxxxx'
> 
> Changing normalization result for ' ' in "repquote()" from ' ' to 'x'
> (or so) can resolve this problem, but it breaks examinations for rst
> syntax.
> 
> This patch chooses to show hint information "(or use '+' for string
> concatenation to avoid ambiguous cases)", because this is very rare
> case and current Mercurial implementation doesn't have such code.
> 
> diff --git a/contrib/check-code.py b/contrib/check-code.py
> --- a/contrib/check-code.py
> +++ b/contrib/check-code.py
> @@ -204,6 +204,12 @@
>       "don't apply % on non-format string"),
>      (r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
>       "don't apply % on non-format string"),
> +    (r'[^"\t ][ \t\n]*(?:"(?:[ xnopq]|%%)+"[ \t\n+]*)+%',
> +     "don't apply % on non-format string\n" +
> +     " (or use '+' for string concatenation to avoid ambiguous cases)"),
> +    (r"[^'\t ][ \t\n]*(?:'(?:[ xnopq]|%%)+'[ \t\n+]*)+%",
> +     "don't apply % on non-format string\n" +
> +     " (or use '+' for string concatenation to avoid ambiguous cases)"),
>      (r'(\w|\)),\w', "missing whitespace after ,"),
>      (r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
>      (r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
> --- a/tests/test-check-code.t
> +++ b/tests/test-check-code.t
> @@ -279,3 +279,40 @@
>     > print ("%% doesn't work as percent") % v
>     don't apply % on non-format string
>    [1]
> +
> +  $ cat > ./invalid-formatting2.py <<EOF
> +  > print "no percent character" % v
> +  > print "no " "percent " "character" % v
> +  > print "no " + "percent " + "character" % v
> +  > 
> +  > print "%% doesn't work as percent" % v
> +  > 
> +  > print "%s " "ambiguous " "case" % v
> +  > print "ambiguous case, %% and %s" % v
> +  > print "ambiguous case (", " %Y-%M-%D"
> +  > 
> +  > print "    ambiguous case"   " %Y-%M-%D"
> +  > print "non-ambiguous case" + " %Y-%M-%D"
> +  > EOF
> +  $ "$check_code" ./invalid-formatting2.py
> +  ./invalid-formatting2.py:1:
> +   > print "no percent character" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:2:
> +   > print "no " "percent " "character" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:3:
> +   > print "no " + "percent " + "character" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:5:
> +   > print "%% doesn't work as percent" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:11:
> +   > print "    ambiguous case"   " %Y-%M-%D"
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  [1]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp



More information about the Mercurial-devel mailing list