[PATCH 2 of 6] py3: pass native string to urlreq.url2pathname()
Augie Fackler
raf at durin42.com
Thu Jun 18 14:41:42 UTC 2020
> On Jun 17, 2020, at 07:26, Yuya Nishihara <yuya at tcha.org> wrote:
>
> On Wed, 17 Jun 2020 03:51:29 +0200, Manuel Jacob wrote:
>> In the following situation, the behavior is problematic:
>>
>> - We’re on Python 3.
>> - The URL path contains a percent-encoded valid UTF-8 byte sequence.
>> urlreq.url2pathname()’s return value is unicode and will contain the
>> corresponding code point.
>> - pycompat.fsencode() uses a different encoding than UTF-8 (e.g.
>> ISO-8859-1). It will encode the code point to a different byte sequence.
>> - The file will not be found and the warning introduced in this patch is
>> not shown.
>>
>> On Python 2, the percent-decoded bytes are preserved (at least on Linux,
>> I don’t have access to a Windows machine to verify).
>>
>> A proper fix would be to have our own implementation for
>> urlreq.url2pathname() that works with bytes. This is the right thing to
>> do on Unix. On Windows, I think that we should assume that the
>> percent-decoded bytes are UTF-8 (see
>> https://en.wikipedia.org/wiki/File_URI_scheme#Windows_2). But it seems
>> like that would be a change from how it works on Python 2 (again, I
>> don’t have a Windows machine to verify) and therefore should be changed
>> in the default branch.
>
> What encoding is expected as a subversion URL? It might be UTF-8 since
> it is Subversion. Encoding handling in the convert extension is sometimes
> wrong. It's probably better to fix things rather than copying the Py2
> behavior.
All pathnames in Subversion are UTF-8.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list