[PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion

Manuel Jacob me at manueljacob.de
Tue Jun 30 13:25:14 UTC 2020


On 2020-06-30 14:24, Yuya Nishihara wrote:
> On Tue, 30 Jun 2020 08:45:47 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <me at manueljacob.de>
>> # Date 1593494609 -7200
>> #      Tue Jun 30 07:23:29 2020 +0200
>> # Branch stable
>> # Node ID 9915fdff8d1732ce62b6df69b50106384d4ad4d1
>> # Parent  e1a4c7f23e804f37c3848fc408607af916d619d1
>> # EXP-Topic svn_encoding
>> convert: handle percent-encoded bytes in file URLs like Subversion
> 
>>  def issvnurl(ui, url):
>>      try:
>>          proto, path = url.split(b'://', 1)
>> @@ -361,7 +387,7 @@
>>              ):
>>                  path = path[:2] + b':/' + path[6:]
>>              try:
>> -                path.decode(fsencoding)
>> +                unicodepath = path.decode(fsencoding)
>>              except UnicodeDecodeError:
>>                  ui.warn(
>>                      _(
>> @@ -371,28 +397,17 @@
>>                      % pycompat.sysbytes(fsencoding)
>>                  )
>>                  return False
>> -            # FIXME: The following reasoning and logic is wrong and 
>> will be
>> -            # fixed in a following changeset.
>> -            # pycompat.fsdecode() / pycompat.fsencode() are used so 
>> that bytes
>> -            # in the URL roundtrip correctly on Unix. 
>> urlreq.url2pathname() on
>> -            # py3 will decode percent-encoded bytes using the utf-8 
>> encoding
>> -            # and the "replace" error handler. This means that it 
>> will not
>> -            # preserve non-UTF-8 bytes 
>> (https://bugs.python.org/issue40983).
>> -            # url.open() uses the reverse function 
>> (urlreq.pathname2url()) and
>> -            # has a similar problem
>> -            # (https://bz.mercurial-scm.org/show_bug.cgi?id=6357). It 
>> makes
>> -            # sense to solve both problems together and handle all 
>> file URLs
>> -            # consistently. For now, we warn.
>> -            unicodepath = 
>> urlreq.url2pathname(pycompat.fsdecode(path))
>> -            if pycompat.ispy3 and u'\N{REPLACEMENT CHARACTER}' in 
>> unicodepath:
>> +            try:
>> +                unicodepath = 
>> url2pathname_like_subversion(unicodepath)
>> +            except NonUtf8PercentEncodedBytes:
>>                  ui.warn(
>>                      _(
>> -                        b'on Python 3, we currently do not support 
>> non-UTF-8 '
>> -                        b'percent-encoded bytes in file URLs for 
>> Subversion '
>> -                        b'repositories\n'
>> +                        b'Subversion does not support non-UTF-8 '
>> +                        b'percent-encoded bytes in file URLs\n'
>>                      )
>>                  )
>> -            path = pycompat.fsencode(unicodepath)
>> +                return False
>> +            path = unicodepath.encode(fsencoding)
> 
> I think pycompat.fsencode() is more correct since the path will be 
> later
> tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
> encoding.unitolocal() is okay.

It was deliberate to use something that fails when it can’t be encoded. 
However, I thought about failing in case of a logic error, not about 
that the URL could contain a (UTF-8-encoded) code point that is not 
available in `fsencoding`.

On Windows, Subversion will be able to handle all code points. We’re 
unnecessarily restricting ourselves by doing the rest of the function on 
bytes. Although not entirely Mercurial-idiomatic, we should maybe 
consider doing the rest of the function in Unicode. We can also bail 
out, but we have to make sure the warning mentions that it’s not the 
user’s fault.

On Unix, Subversion will try later to convert the path to the locale 
encoding, which fails if the code point is not encodable. We should bail 
out in this case.



More information about the Mercurial-devel mailing list