Unicode support for non-unicode locales
Gábor Farkas
gabor at nekomancer.net
Tue Oct 9 07:08:56 UTC 2007
Matt Mackall wrote:
> On Tue, Oct 09, 2007 at 01:59:52AM +0900, Shun-ichi GOTO wrote:
>> 2007/10/9, Shun-ichi GOTO <shunichi.goto at gmail.com>:
>>> If we treat filename as raw byte data, some filename might be broken
>>> in path operation. So the Python code shold handle filename as unicode
>>> characters by decoding.
>> In fact, current mercurial cannot manage some filename.
>> For example, a filename "?$B at 55,I=8=.txt" is the case.
>> 4 characters "?$B at 55,I=8=" is Japanese of "regular expression"
>> and 2nd byte of 3rd character is '\' (0x5c).
>> So, hg ci -Am "test" fails on adding this file.
>>
>> {{{
>> [c:\temp\test]hg ci -Am initial
>> adding ?$B at 55,!&8=.txt
>> removing ?$B at 55,!&8=.txt
>> dir1/?$B at 55,!&8=.txt not tracked!
>> ?$B at 55,!&8=.txt not tracked!
>> nothing changed
>> }}}
>
> Yes, Mercurial will be unhappy with wide character sets in various
> situations. It's either that or be unhappy with single byte character
> sets much more often.
>
(for reference, the above-mentioned example works fine on linux with an
utf-8 locale. i assume it works well everywhere if you keep the same
locale (filesystem-encoding) everywhere)
but i think there are problems even when you checkout code on an unicode
locale, if that locale is different from the where-the-file-was-added
locale.
for example, add a non-ascii-file to a mercurial repository on an utf-8
locale (most linux systems), and checkout on windows nt (afaik utf-16),
and you get garbled filenames.
do i understand correctly, that this is intentional, and there are no
plans to fix this?
is there at least a workaround for this? (except the
do-not-use-non-ascii-filenames? :)
gabor
More information about the Mercurial-devel
mailing list