[PATCH 5 of 5 STABLE V2] check-code: detect incorrect format operation on the string not including "%"
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Tue Apr 15 18:12:33 UTC 2014
# HG changeset patch
# User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
# Date 1397585100 -32400
# Wed Apr 16 03:05:00 2014 +0900
# Branch stable
# Node ID 6b72aa5a88ead264ff6a6b17e9b4f14270198903
# Parent 565620d1b409dcd0c46c462e1ad66c622b2a8b85
check-code: detect incorrect format operation on the string not including "%"
This patch introduces new patterns to detect incorrect format
operations on the string not including any "%" specifiers: it causes
TypeError at runtime.
Patterns newly introduced by this patch treat format operations in
the style below:
"format-string" % value
Newly introduced patterns base on the ones introduced by the preceding
patch for the similar problems in '("format string") % value' style.
Main difference between them is that this patch uses "[ xnopq]"
instead of "[^%']" to match against 'characters other than "%" in
string' correctly.
For example, Python interpreter (and programmers) can recognize two
strings (1) 'preceding string' and (2) '%s format' in the code below:
func('preceding string', '%s format')
But regexp matching may (mis-)recognize that a string ', ' (not
including any "%" specifiers) and a formatting operation "%s" (from
'%s format') on it in this code unexpectedly.
After normalizing by "repquote()" in check-code.py, strings in Python
code doesn't contain characters other than ' ', 'x', 'n', 'o', 'p',
'q' or '%'. For example, the example code above is normalized like as
below:
func('xxxxxxxxx xxxxxx', '%x xxxxxx')
Then , "'[^%']+'" still can't avoid unexpected matching against ', ',
but "'[ xnopq]+'" can. This is because of using "[ xnopq]" instead of
"[^%']" in this patch.
"[^%']" is simple enough for '("format-string") % value' style treated
in preceding patch, because anchoring to "(" and ")" avoids unexpected
matching described above.
Newly introduced patterns mis-recognize that format operation '%x' is
applied on the string ' ' in the example code below (after normalizing
by "repquote()"), because "[ xnopq]" matches against it: in the
example above, comma (",") between strings prevents from
mis-recognizing.
print 'xxxxxxxxx xxxxxx' '%x xxxxxx'
Changing normalization result for ' ' in "repquote()" from ' ' to 'x'
(or so) can resolve this problem, but it breaks examinations for rst
syntax.
This patch chooses to show hint information "(or use '+' for string
concatenation to avoid ambiguous cases)", because this is very rare
case and current Mercurial implementation doesn't have such code.
diff --git a/contrib/check-code.py b/contrib/check-code.py
--- a/contrib/check-code.py
+++ b/contrib/check-code.py
@@ -204,6 +204,12 @@
"don't apply % on non-format string"),
(r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
"don't apply % on non-format string"),
+ (r'[^"\t +][ \t\n]*(?:"(?:[ xnopq]|%%)+"[ \t\n+]*)+%',
+ "don't apply % on non-format string\n" +
+ " (or use '+' for string concatenation to avoid ambiguous cases)"),
+ (r"[^'\t +][ \t\n]*(?:'(?:[ xnopq]|%%)+'[ \t\n+]*)+%",
+ "don't apply % on non-format string\n" +
+ " (or use '+' for string concatenation to avoid ambiguous cases)"),
(r'(\w|\)),\w', "missing whitespace after ,"),
(r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
(r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
diff --git a/tests/test-check-code.t b/tests/test-check-code.t
--- a/tests/test-check-code.t
+++ b/tests/test-check-code.t
@@ -313,4 +313,54 @@
./invalid-formatting.py:11:
> print ("%% doesn't work as percent") % v
don't apply % on non-format string
- [1]
+ ./invalid-formatting.py:14:
+ > print ("ambiguous case (" " %Y-%M-%D")
+ don't apply % on non-format string
+ (or use '+' for string concatenation to avoid ambiguous cases)
+ [1]
+
+ $ cat > ./invalid-formatting2.py <<EOF
+ > print "no percent character" % v
+ > print "no " "percent " "character" % v
+ > print "no " + "percent " + "character" % v
+ >
+ > print "%% doesn't work as percent" % v
+ >
+ > print "%s " "ambiguous " "case" % v
+ > print "ambiguous case, %% and %s" % v
+ > print "ambiguous case (", " %Y-%M-%D"
+ >
+ > print " ambiguous case" " %Y-%M-%D"
+ > print "non-ambiguous case" + " %Y-%M-%D"
+ >
+ > func("preceding string", "%s "
+ > " valid"
+ > " case"
+ > % v,
+ > "%s " +
+ > "valid" +
+ > "case"
+ > % v)
+ > EOF
+ $ "$check_code" ./invalid-formatting2.py
+ ./invalid-formatting2.py:1:
+ > print "no percent character" % v
+ don't apply % on non-format string
+ (or use '+' for string concatenation to avoid ambiguous cases)
+ ./invalid-formatting2.py:2:
+ > print "no " "percent " "character" % v
+ don't apply % on non-format string
+ (or use '+' for string concatenation to avoid ambiguous cases)
+ ./invalid-formatting2.py:3:
+ > print "no " + "percent " + "character" % v
+ don't apply % on non-format string
+ (or use '+' for string concatenation to avoid ambiguous cases)
+ ./invalid-formatting2.py:5:
+ > print "%% doesn't work as percent" % v
+ don't apply % on non-format string
+ (or use '+' for string concatenation to avoid ambiguous cases)
+ ./invalid-formatting2.py:11:
+ > print " ambiguous case" " %Y-%M-%D"
+ don't apply % on non-format string
+ (or use '+' for string concatenation to avoid ambiguous cases)
+ [1]
More information about the Mercurial-devel
mailing list