Unicode support for non-unicode locales

Matt Mackall mpm at selenic.com
Tue Oct 9 15:46:12 UTC 2007


On Tue, Oct 09, 2007 at 02:59:31PM +0900, Shun-ichi GOTO wrote:
> Matt, you may assume "any byte sequence can be accepted as filename in
> ASCII/latin-1 world, so it is better for users and tools", but it is
> not true in other world like UTF-8 file-system and Shift_JIS
> file-system because of of validation of byte sequence and sanitization
> of byte sequence.

You're absolutely right about Shift_JIS. (At the same time, you're 
absolutely wrong about UTF-8.)

But we can't make it better for Shift_JIS without making it worse for
the rest of the world. Until you come up with a fix that doesn't break
Makefiles (and a thousand other tools), it's hardly worth talking
about.

> > This fix might work fine for special cases like going from one Russian
> > or Japanese encoding to another, but in general, it makes a bad
> > problem worse. It's much better overall for data to be "corrupted" by
> > "passing it through untouched".
> 
> No, "passing it through untouched" makes things worse for any
> languages.  As I said first, using untouched byte sequence does not
> solve the issue of Makefile you mentioned.

It does for most encodings, including UTF-8 which was specifically
designed with backward compatibility with ASCII in mind.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list