[PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion
Yuya Nishihara
yuya at tcha.org
Tue Jun 30 12:24:24 UTC 2020
On Tue, 30 Jun 2020 08:45:47 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <me at manueljacob.de>
> # Date 1593494609 -7200
> # Tue Jun 30 07:23:29 2020 +0200
> # Branch stable
> # Node ID 9915fdff8d1732ce62b6df69b50106384d4ad4d1
> # Parent e1a4c7f23e804f37c3848fc408607af916d619d1
> # EXP-Topic svn_encoding
> convert: handle percent-encoded bytes in file URLs like Subversion
> def issvnurl(ui, url):
> try:
> proto, path = url.split(b'://', 1)
> @@ -361,7 +387,7 @@
> ):
> path = path[:2] + b':/' + path[6:]
> try:
> - path.decode(fsencoding)
> + unicodepath = path.decode(fsencoding)
> except UnicodeDecodeError:
> ui.warn(
> _(
> @@ -371,28 +397,17 @@
> % pycompat.sysbytes(fsencoding)
> )
> return False
> - # FIXME: The following reasoning and logic is wrong and will be
> - # fixed in a following changeset.
> - # pycompat.fsdecode() / pycompat.fsencode() are used so that bytes
> - # in the URL roundtrip correctly on Unix. urlreq.url2pathname() on
> - # py3 will decode percent-encoded bytes using the utf-8 encoding
> - # and the "replace" error handler. This means that it will not
> - # preserve non-UTF-8 bytes (https://bugs.python.org/issue40983).
> - # url.open() uses the reverse function (urlreq.pathname2url()) and
> - # has a similar problem
> - # (https://bz.mercurial-scm.org/show_bug.cgi?id=6357). It makes
> - # sense to solve both problems together and handle all file URLs
> - # consistently. For now, we warn.
> - unicodepath = urlreq.url2pathname(pycompat.fsdecode(path))
> - if pycompat.ispy3 and u'\N{REPLACEMENT CHARACTER}' in unicodepath:
> + try:
> + unicodepath = url2pathname_like_subversion(unicodepath)
> + except NonUtf8PercentEncodedBytes:
> ui.warn(
> _(
> - b'on Python 3, we currently do not support non-UTF-8 '
> - b'percent-encoded bytes in file URLs for Subversion '
> - b'repositories\n'
> + b'Subversion does not support non-UTF-8 '
> + b'percent-encoded bytes in file URLs\n'
> )
> )
> - path = pycompat.fsencode(unicodepath)
> + return False
> + path = unicodepath.encode(fsencoding)
I think pycompat.fsencode() is more correct since the path will be later
tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
encoding.unitolocal() is okay.
More information about the Mercurial-devel
mailing list