Unicode support request.
罗勇刚(Yonggang Luo)
luoyonggang at gmail.com
Thu Oct 20 14:52:56 UTC 2011
在 2011年10月20日星期四,Haszlakiewicz, Eric 写道:
> > -----Original Message-----
> > From: mercurial-bounces at selenic.com <javascript:;> [mailto:mercurial-<javascript:;>
> >
> > It would be great if at the end we get understanding that _everything_
> > that touches a human eye MUST be UTF-8 encoded in the repository. Once
> > we have this agreed and implemented, we can go further and translate
> > all the data (file names, user names, commit messages etc) to the
> > expected encoding during input/output. It should be no big deal because
> > at the moment of translation you always know on which platform you sit
> > and which encoding you require.
>
> Actually, this IS a big deal because then you can end up with byte
> sequences that CAN'T be translated.
>
> http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
>
> e.g. bit sequence 10101001 (169 decimal, copyright symbol in iso-8859-1) is
> invalid:
> $ perl -e 'print chr(0b10101001);' | iconv -f utf8 -t utf8
> iconv: illegal input sequence at position 0
>
> You might say this will work "fine" if we make sure to encode things before
> they get into the repository, but that's not so simple. To encode things,
> you need to
On windows, everything is Unicode, just because Mercurial called the wrong
API, so you get the wrong encoded data. When
you calling to Unicode API, it's absolutely give you the RIGHT UTF8
sequence, you didn't need to check it, if it's wrong, then it's the problem
of Windows operating system, that should not be cared by Mercruail.
> know what encoding you're using, and if there happens to be a file present
> whose name contains a byte sequence that is invalid for the currently set
> encoding,
When calling to Windows Unicode API, everything is UTF16 wide-string, there
is nothing need to handle(it's can be easily done by python, just use
unicodeString.decode("utf8")
> then it's not clear what doing something like "hg add" on that file should
> do. You also run into problems if you happen to switch encodings from one
> hg command to the next.
>
*This is the only problem by introducing UTF8 on Windows, We should do some
automatically RENAME when checkout historical mercurial repository, rename
to the UTF8 replacement, that's need some consideration.*
> There are probably ways that it could be made to work (e.g. consider the
> current filename encoding to be a property of the clone, rather than, say,
> an environment variable) but it seems like handling all of the corner cases
> could get quite complicated.
>
???, What's this mean? If we calling to Windows Unicode API, there is no
encoding problem anymore,
Thanks.
>
> eric
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com <javascript:;>
> http://selenic.com/mailman/listinfo/mercurial
>
--
此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20111020/930231fb/attachment-0002.html>
More information about the Mercurial
mailing list