Unicode support for non-unicode locales

Matt Mackall mpm at selenic.com
Tue Oct 9 16:48:21 UTC 2007

On Tue, Oct 09, 2007 at 12:07:50PM +0600, Densetsu no Ero-sennin wrote:
> Moreover, most modern distributions offer UTF-8 by default. And most modern 
> file archivers, including GNU tar in POSIX mode, whose duty is to preserve 
> user's data exactly, are creating files in local encoding when unpacking 
> archives.

Oh really?

utf-8$ touch <japan>
utf-8$ tar --posix -c -f foo.tar <japan>
utf-8$ zip foo.zip <japan>

ascii$ tar --posix -x -v -f ../foo.tar 
ascii$ ls
ascii$ rm *
ascii$ unzip ../foo.zip
Archive:  ../foo.zip
 extracting: <garbage>
ascii$ ls

And frankly, I think this is the only sensible thing to do. Because if I do:

$ hg init
$ touch <japanese> <korean> <russian> <french> english
$ echo "cat <japanese> <korean> <russian> <french> english | md5sum" > check
$ chmod +x check
$ hg ci -Am "test"

ASCII, latin-1, koi8, or basically any other encoding:
$ hg pull -u
$ ./check
d41d8cd98f00b204e9800998ecf8427e -

..it works.

If we start trying to transcode filenames, we will have to transcode
file contents as well, and that problem is insoluble.

Mathematics is the supreme nostalgia of our time.

More information about the Mercurial-devel mailing list