A proposal on solve encoding problem on Windows.
罗勇刚(Yonggang Luo)
luoyonggang at gmail.com
Fri Oct 21 15:56:21 UTC 2011
2011/10/21 Andrey <py4fun at gmail.com>:
>> > The most important goal for me was actually this: 2 (3?). use utf8 as
>> > the default encoding for new commits.
>> >
>> > Now I see (thanks, Matt), that it may introduce serious regression
>> > problems. I need some time to think about a possible solution.
>> >
>> >> if all files in manifest are valid UTF-8:
>> >> # repo is already in UTF-8 mode or is pure ASCII
>> >> mode = utf8transcoding
>> >
>> > This check is just a guess. We cannot rely on it. In general, it is
>> > not possible to detect the encoding from the sequence of bytes.
>>
>> You're right in principle, that a Latin-1 encoded text with "pære" also
>> happen to be the UTF-8 encoding of "pære". However, what Matt writes is
>> that the chance of that happening is small and so it is okay with him to
>> declare a text to be UTF-8 if it can be correctly decoded as such.
>>
>> --
>> Martin Geisler
>
> What I mean is that UTF-16 encoded text may look like (the same bytes) as
> the UTF-8 encoded text
> Without BOM (byte order mark) we cannot make any conclusions about the
> content. Do you mean that BOM _is_ stored in the repository ?
BOM never stored in the filename, it's just stored in the content(the
begging) of the file. so it's should not be considerated here.
It's off topic.
Also, UTF-16 should not be a filename encoding stored in Mercurial
repository, because it's not compatible with ASCII.
for example the space character will be represent as two byte 0x20
0x00 in LE, the 0x00 should not be appeared.
The UTF-32 is not be a choice for almost the same reason (4 byte vs 2byte).
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
>
>
--
此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo
More information about the Mercurial
mailing list