[PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion

Yuya Nishihara yuya at tcha.org
Tue Jun 30 12:24:24 UTC 2020


On Tue, 30 Jun 2020 08:45:47 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <me at manueljacob.de>
> # Date 1593494609 -7200
> #      Tue Jun 30 07:23:29 2020 +0200
> # Branch stable
> # Node ID 9915fdff8d1732ce62b6df69b50106384d4ad4d1
> # Parent  e1a4c7f23e804f37c3848fc408607af916d619d1
> # EXP-Topic svn_encoding
> convert: handle percent-encoded bytes in file URLs like Subversion

>  def issvnurl(ui, url):
>      try:
>          proto, path = url.split(b'://', 1)
> @@ -361,7 +387,7 @@
>              ):
>                  path = path[:2] + b':/' + path[6:]
>              try:
> -                path.decode(fsencoding)
> +                unicodepath = path.decode(fsencoding)
>              except UnicodeDecodeError:
>                  ui.warn(
>                      _(
> @@ -371,28 +397,17 @@
>                      % pycompat.sysbytes(fsencoding)
>                  )
>                  return False
> -            # FIXME: The following reasoning and logic is wrong and will be
> -            # fixed in a following changeset.
> -            # pycompat.fsdecode() / pycompat.fsencode() are used so that bytes
> -            # in the URL roundtrip correctly on Unix. urlreq.url2pathname() on
> -            # py3 will decode percent-encoded bytes using the utf-8 encoding
> -            # and the "replace" error handler. This means that it will not
> -            # preserve non-UTF-8 bytes (https://bugs.python.org/issue40983).
> -            # url.open() uses the reverse function (urlreq.pathname2url()) and
> -            # has a similar problem
> -            # (https://bz.mercurial-scm.org/show_bug.cgi?id=6357). It makes
> -            # sense to solve both problems together and handle all file URLs
> -            # consistently. For now, we warn.
> -            unicodepath = urlreq.url2pathname(pycompat.fsdecode(path))
> -            if pycompat.ispy3 and u'\N{REPLACEMENT CHARACTER}' in unicodepath:
> +            try:
> +                unicodepath = url2pathname_like_subversion(unicodepath)
> +            except NonUtf8PercentEncodedBytes:
>                  ui.warn(
>                      _(
> -                        b'on Python 3, we currently do not support non-UTF-8 '
> -                        b'percent-encoded bytes in file URLs for Subversion '
> -                        b'repositories\n'
> +                        b'Subversion does not support non-UTF-8 '
> +                        b'percent-encoded bytes in file URLs\n'
>                      )
>                  )
> -            path = pycompat.fsencode(unicodepath)
> +                return False
> +            path = unicodepath.encode(fsencoding)

I think pycompat.fsencode() is more correct since the path will be later
tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
encoding.unitolocal() is okay.



More information about the Mercurial-devel mailing list