[PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion

Yuya Nishihara yuya at tcha.org
Tue Jun 30 14:48:30 UTC 2020


On Tue, 30 Jun 2020 16:28:55 +0200, Manuel Jacob wrote:
> On 2020-06-30 16:03, Yuya Nishihara wrote:
> > On Tue, 30 Jun 2020 15:43:29 +0200, Manuel Jacob wrote:
> >> >>> -            path = pycompat.fsencode(unicodepath)
> >> >>> +                return False
> >> >>> +            path = unicodepath.encode(fsencoding)
> >> >>
> >> >> I think pycompat.fsencode() is more correct since the path will be
> >> >> later
> >> >> tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
> >> >> encoding.unitolocal() is okay.
> >> >
> >> > It was deliberate to use something that fails when it can’t be
> >> > encoded. However, I thought about failing in case of a logic error,
> >> > not about that the URL could contain a (UTF-8-encoded) code point that
> >> > is not available in `fsencoding`.
> >> >
> >> > On Windows, Subversion will be able to handle all code points. We’re
> >> > unnecessarily restricting ourselves by doing the rest of the function
> >> > on bytes. Although not entirely Mercurial-idiomatic, we should maybe
> >> > consider doing the rest of the function in Unicode. We can also bail
> >> > out, but we have to make sure the warning mentions that it’s not the
> >> > user’s fault.
> >> >
> >> > On Unix, Subversion will try later to convert the path to the locale
> >> > encoding, which fails if the code point is not encodable. We should
> >> > bail out in this case.
> >> 
> >> Let me add that with "bail out", I included that a warning describing
> >> the problem is shown.
> >> 
> >> Since what I described is mostly about showing nicer messages, should
> >> they be added in an amended patch or as follow-ups?
> > 
> > I'm not sure if I understood what you mean.
> > 
> > I just pointed out that the fsencoding is the encoding in which 
> > Subversion
> > will convert unicode to bytes, whereas fsencode() is the function for 
> > Python
> > layer. And the path should be a bytes encoded in Python way.
> 
> Subversion will do a similar check as in the rest of the function to 
> determine whether the repo is actually a SVN repo.
> 
> If we use a different encoding, we have a higher chance of running into 
> cases where our function thinks it’s a SVN repo, while Subversion thinks 
> it’s not (and vice versa). Therefore, I think we should use 
> `fsencoding`.

Then, can you adjust the documentation of fsencoding? Without that, I would
see there are at least three types of path variables:

 - py_fs_bytes_t (or maybe hg_fs_bytes_t)
 - svn_fs_bytes_t
 - unicode_t (or svn_url_unicode_t)

Be aware that the path will be converted back to UCS2-ish encoding on
Windows by either Python 3 or Win32 ANSI API.



More information about the Mercurial-devel mailing list