Unicode support for non-unicode locales
Densetsu no Ero-sennin
densetsu.no.ero.sennin at gmail.com
Tue Oct 9 06:07:50 UTC 2007
On 8 October 2007 (Mon), Matt Mackall wrote:
> Again, what happens if someone does a checkout in an ASCII/latin-1
> locale? That's most of the computing world. The answer is: your
> Russian characters are not just mangled, they're completely LOST. In
> fact, you probably won't be able to check out your project at all
> because filename "??????" will collide with filename "??????".
I believe, Mercurial must raise UnicodeDecodeError in such cases instead of
silently corrupting filenames. ASCII locale is not suitable for working with
Japanese Kanji and Cyrillic. If one needs to work with non-ASCII filenames,
he needs a locale supporting that. An if one prefers to stick with
non-Unicode locale, like Latin-1, he probably knows what he is doing and does
not want to deal with Cyrillic and Kanji.
Moreover, most modern distributions offer UTF-8 by default. And most modern
file archivers, including GNU tar in POSIX mode, whose duty is to preserve
user's data exactly, are creating files in local encoding when unpacking
archives. And most software doing network file transfers, including web
browsers and email clients, encode filenames properly. Why should Mercurial
be different?
Yes, obviously, Unicode is mach harder to deal with then good old ASCII, but
the world is large and multilingual, and we can't just shut our eyes to that
fact. Like Guido once said: "Face it. Unicode stinks (from the programmer's
POV). But we'll have to live with it."
More information about the Mercurial-devel
mailing list