[PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion
Manuel Jacob
me at manueljacob.de
Tue Jun 30 14:28:55 UTC 2020
On 2020-06-30 16:03, Yuya Nishihara wrote:
> On Tue, 30 Jun 2020 15:43:29 +0200, Manuel Jacob wrote:
>> >>> - path = pycompat.fsencode(unicodepath)
>> >>> + return False
>> >>> + path = unicodepath.encode(fsencoding)
>> >>
>> >> I think pycompat.fsencode() is more correct since the path will be
>> >> later
>> >> tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
>> >> encoding.unitolocal() is okay.
>> >
>> > It was deliberate to use something that fails when it can’t be
>> > encoded. However, I thought about failing in case of a logic error,
>> > not about that the URL could contain a (UTF-8-encoded) code point that
>> > is not available in `fsencoding`.
>> >
>> > On Windows, Subversion will be able to handle all code points. We’re
>> > unnecessarily restricting ourselves by doing the rest of the function
>> > on bytes. Although not entirely Mercurial-idiomatic, we should maybe
>> > consider doing the rest of the function in Unicode. We can also bail
>> > out, but we have to make sure the warning mentions that it’s not the
>> > user’s fault.
>> >
>> > On Unix, Subversion will try later to convert the path to the locale
>> > encoding, which fails if the code point is not encodable. We should
>> > bail out in this case.
>>
>> Let me add that with "bail out", I included that a warning describing
>> the problem is shown.
>>
>> Since what I described is mostly about showing nicer messages, should
>> they be added in an amended patch or as follow-ups?
>
> I'm not sure if I understood what you mean.
>
> I just pointed out that the fsencoding is the encoding in which
> Subversion
> will convert unicode to bytes, whereas fsencode() is the function for
> Python
> layer. And the path should be a bytes encoded in Python way.
Subversion will do a similar check as in the rest of the function to
determine whether the repo is actually a SVN repo.
If we use a different encoding, we have a higher chance of running into
cases where our function thinks it’s a SVN repo, while Subversion thinks
it’s not (and vice versa). Therefore, I think we should use
`fsencoding`.
More information about the Mercurial-devel
mailing list