Getting http://mercurial.selenic.com/wiki/FixUtf8Extension as a part of hgsubversion
罗勇刚(Yonggang Luo)
luoyonggang at gmail.com
Wed Oct 19 14:19:54 UTC 2011
2011/10/19 Tom Anderson <tom.anderson at e2x.co.uk>
> 2011/10/19 Martin Geisler <mg at aragost.com>:
>
> > You ask why Subversion can work on Windows/Linux and the answer is
> > simple: they have chosen to transcode the filenames to and from
> > Unicode. Mercurial has chosen to not do this.
> >
> > It is a tradeoff: by transcoding we would support some filenames better
> > on the two systems, but break some build tools. We would also have to
> > deal with a lot more bug reports about encoding problems when the
> > transcoding fails.
> >
> > As an example, if you have a repository with a file called "罗勇刚.txt",
> > then I can make a clone to my Latin-1 Linux box and I can see the file
> > today. If Mercurial would try to transcode the file into Latin-1, then
> > the checkout would fail. Depending on what I need to do with the file,
> > failing might be good or bad.
>
> Hold on - at the moment, when you try to check out Luo's file, you'll
> get a file whose name is just complete gibberish. If Luo uses UTF-8,
> the bytes are e7 bd 97 e5 8b 87 e5 88 9a, which in ISO 8859-1 gives
> "ç½?å??å??", where the question marks are characters my machine
> doesn't know. Are you seriously suggesting that this is in any way
> useful, let alone correct behaviour?
>
> What happens if you check in some files with entirely alphabetical
> names on your Latin-1 box, and i check them out on my EBCDIC machine?
>
Definitely agreed, the UTF8 encoding act as a intermediate Encoding for
iterchange
between different Operating System is the only solution to Mercurial(There
is no other
replacement or be more difficult, such as UTF16, UTF32, no one will
agreed).
The position of UTF8 in Computer Machine world is act as exactly with
English in
the Real world for international communication, likes I talks to you.
That's obviously if I write down Chinese, then rare people will understand
me.
By using UTF8, even though you cannot rendering it out, or can not find
corresponding
character under EBCDIC machine or ASCII machine, But at least we storage it
as a
unified Encoding, so we can handle it in the consistent way. Whenever I
copied the
mercurial repo to Machine Linux, Machine Mac OS, Machine Win,
Machine EBCDIC
or Machine Magic, the internal encoding of mercurial filename is UTF8, we
just need to
check it out in the right way when the machine support it. When the machine
not support
of it, then it's the problem of The machine, not the problem of Mercurial.
There, the names can be perfectly accurately represented at either
> end. If Mercurial treats names as byte strings, doesn't that mean i
> will get gibberish again?
>
> You are absolutely right that this is a tough problem, because not
> every filename that someone might write can be represented correctly
> on everyone else's filesystem. But there are many filenames which can
> be represented correctly on a great many peoples' filesystems which
> Mercurial gets wrong.
>
> I read the EncodingStrategy page on the wiki. It seems that the only
> real argument for treating filenames as bytes is the "makefile
> problem". The comment that "non-ASCII filenames are not reliably
> portable between systems in general" is hokum. In essence, this means
> that the Mercurial project made an early decision that it cared more
> about supporting broken unix build tools than it did about supporting
> users of non-ASCII languages. That's fine, but it's a decision that
> the project should be open about.
>
> tom
>
> --
> Tom Anderson | e2x Ltd, 1 Norton Folgate, London E1
> 6DB
> (e) tom at e2x.co.uk | (m) +44 (7960) 989794 | (f) +44 (20) 7100
> 3749
>
--
此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20111019/27347eab/attachment-0002.html>
More information about the Mercurial
mailing list