xml style doesn't generate valid xml

Stanimir Stamenkov s7an10 at netscape.net
Tue Nov 23 21:50:14 UTC 2010


Tue, 23 Nov 2010 21:19:34 +0000, /Haszlakiewicz, Eric/:
> From: Matt Mackall [mailto:mpm at selenic.com]
>
>>> However, for the issue with xmlescape turning things into spaces, I
>>> think that's because there's an explicit line of code in xmlescape
>>> that does that!  In templatefilters.py, the last line of xmlescape is:
>>>      return re.sub('[\x00-\x08\x0B\x0C\x0E-\x1F]', ' ', text)
>>
>> Ugh, who ordered that.
> (...)
>      def fixupcontrols(matchobj):
>          return "&#" + "%d" % ord(matchobj.group(0)) + ";"
>      return re.sub('[\x00-\x08\x0B\x0C\x0E-\x1F]', fixupcontrols, new_s)

I'm not sure I understand this Python code enough but note it is a 
syntax (fatal, well-formedness) error to include control characters 
other than tab, carriage return and line feed, even as numeric 
character references <http://www.w3.org/TR/xml/#dt-charref>:

> Characters referred to using character references MUST match the
> production for Char.  <http://www.w3.org/TR/xml/#charsets>

The XML 1.1 specification <http://www.w3.org/TR/xml11/#charsets> is 
more relaxed on this but still doesn't allow the 0 (zero) character 
code.

-- 
Stanimir



More information about the Mercurial mailing list