Unicode support for non-unicode locales
Matt Mackall
mpm at selenic.com
Tue Oct 9 16:48:21 UTC 2007
On Tue, Oct 09, 2007 at 12:07:50PM +0600, Densetsu no Ero-sennin wrote:
> Moreover, most modern distributions offer UTF-8 by default. And most modern
> file archivers, including GNU tar in POSIX mode, whose duty is to preserve
> user's data exactly, are creating files in local encoding when unpacking
> archives.
Oh really?
utf-8$ touch <japan>
utf-8$ tar --posix -c -f foo.tar <japan>
utf-8$ zip foo.zip <japan>
ascii$ tar --posix -x -v -f ../foo.tar
\346\227\245\346\234\254\345\233\275
ascii$ ls
?????????
ascii$ rm *
ascii$ unzip ../foo.zip
Archive: ../foo.zip
extracting: <garbage>
ascii$ ls
?????????
And frankly, I think this is the only sensible thing to do. Because if I do:
utf-8:
$ hg init
$ touch <japanese> <korean> <russian> <french> english
$ echo "cat <japanese> <korean> <russian> <french> english | md5sum" > check
$ chmod +x check
$ hg ci -Am "test"
ASCII, latin-1, koi8, or basically any other encoding:
$ hg pull -u
$ ./check
d41d8cd98f00b204e9800998ecf8427e -
..it works.
If we start trying to transcode filenames, we will have to transcode
file contents as well, and that problem is insoluble.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list