Can't commit unicode filenames on Windows

Martin Geisler martin at geisler.net
Mon Oct 19 12:34:44 UTC 2015


On Sat, Oct 17, 2015 at 7:42 AM, anatoly techtonik <techtonik at gmail.com>
wrote:

> On Mon, Oct 12, 2015 at 8:19 PM, Matt Mackall <mpm at selenic.com> wrote:
>
>> On Mon, 2015-10-12 at 16:32 +0300, anatoly techtonik wrote:
>>
>> And also not break people who are using local codepage filenames in their
>> Windows-only repos today.
>
>
> And how do local codepage filenames not break Mac and Linux? They also
> support non-UTF-8 filenames, right? So I don't think this is a OS specific
> problem.
>

This is an old discussion, you should try searching the mailinglist archive
to catch up.

One thing that might be surprising is that Matt (as far as I understand)
don't consider it "broken" when Mercurial writes filenames in, say, Latin-1
encoding onto a filesystem where sys.getfilesystemencoding() says to use
UTF-8.

Put differently, Mercurial is 8-bit clean with regard to filenames and will
give you the same thing back as you gave it originally. Here "you" is not
really you, it is the Python filesystem function used on bytestrings, not
Unicode strings.

For better and worse, that's how Mercurial was designed to work and it is
considered a feature that I can clone an ancient C project with Latin-1
filenames and compile it using a Makefile that references these filenames.
On the other hand, build tools that take the current locale into account
when searching for files break because of this (I tested this with Ant many
years ago).

That is the tradeoff Mercurial has been making for more than 10 years. The
EncodingStrategy is about letting Mercurial read/write filenames from its
internal manifest to the local encoding when Mercurial sees that the
manifest is completely UTF-8 encoded. Starting from scratch, that should
also mean that committing a file with Persian symbols in the filename will
save it UTF-8 encoded in the manifest. On clone and update, you'll then get
the UTF-8 transcoded to your local Windows locale and things should work.

People with existing repositories with non UTF-8 encodings in the filenames
will see no change -- Mercurial keeps reading/writing the files in "raw"
form.

Overall it sounds like a good plan to me when you consider backwards
compatibility and a reasonable upgrade path.

-- 
Martin Geisler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20151019/e95ab43c/attachment-0002.html>


More information about the Mercurial mailing list