Getting http://mercurial.selenic.com/wiki/FixUtf8Extension as a part of hgsubversion

Martin Geisler mg at aragost.com
Wed Oct 19 15:44:21 UTC 2011


Tom Anderson <tom.anderson at e2x.co.uk> writes:

> 2011/10/19 Martin Geisler <mg at aragost.com>:
>
>> As an example, if you have a repository with a file called "罗勇
>> 刚.txt", then I can make a clone to my Latin-1 Linux box and I can
>> see the file today. If Mercurial would try to transcode the file into
>> Latin-1, then the checkout would fail. Depending on what I need to do
>> with the file, failing might be good or bad.
>
> Hold on - at the moment, when you try to check out Luo's file, you'll
> get a file whose name is just complete gibberish. If Luo uses UTF-8,
> the bytes are e7 bd 97 e5 8b 87 e5 88 9a, which in ISO 8859-1 gives
> "ç½?å??å??", where the question marks are characters my machine
> doesn't know. Are you seriously suggesting that this is in any way
> useful, let alone correct behaviour?

Personally, I don't such a filename useful, but the argument made is
that it's useful to allow the checkout since it gives you a chance to
fix the filename by renaming it to, say, "luo-yonggang.txt" instead.

If Mercurial were to abort the checkout when it meets a filename that
cannot be transcoded, then we would make it fail in cases where it run
today. I don't find those cases very useful because of the corrupt
filenames, but others disagree.

> I read the EncodingStrategy page on the wiki. It seems that the only
> real argument for treating filenames as bytes is the "makefile
> problem". The comment that "non-ASCII filenames are not reliably
> portable between systems in general" is hokum. In essence, this means
> that the Mercurial project made an early decision that it cared more
> about supporting broken unix build tools than it did about supporting
> users of non-ASCII languages. That's fine, but it's a decision that
> the project should be open about.

To Matt's defence, I think he has been very open about this: Mercurial
is *encoding agnostic* when it comes to filenames. This is how Unix
works and this is how Mercurial has worked for more than five years now.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/



More information about the Mercurial mailing list