Unicode support for non-unicode locales

Densetsu no Ero-sennin densetsu.no.ero.sennin at gmail.com
Tue Oct 9 06:07:50 UTC 2007


On 8 October 2007 (Mon), Matt Mackall wrote:
> Again, what happens if someone does a checkout in an ASCII/latin-1
> locale? That's most of the computing world. The answer is: your
> Russian characters are not just mangled, they're completely LOST. In
> fact, you probably won't be able to check out your project at all
> because filename "??????" will collide with filename "??????".

I believe, Mercurial must raise UnicodeDecodeError in such cases instead of 
silently corrupting filenames. ASCII locale is not suitable for working with 
Japanese Kanji and Cyrillic. If one needs to work with non-ASCII filenames, 
he needs a locale supporting that. An if one prefers to stick with 
non-Unicode locale, like Latin-1, he probably knows what he is doing and does 
not want to deal with Cyrillic and Kanji.

Moreover, most modern distributions offer UTF-8 by default. And most modern 
file archivers, including GNU tar in POSIX mode, whose duty is to preserve 
user's data exactly, are creating files in local encoding when unpacking 
archives. And most software doing network file transfers, including web 
browsers and email clients, encode filenames properly. Why should Mercurial 
be different?

Yes, obviously, Unicode is mach harder to deal with then good old ASCII, but 
the world is large and multilingual, and we can't just shut our eyes to that 
fact. Like Guido once said: "Face it. Unicode stinks (from the programmer's 
POV). But we'll have to live with it."



More information about the Mercurial-devel mailing list