[PATCH 2 of 6] Determine default locale encoding and stdio encoding on start-up
Andrey
grooz-work at gorodok.net
Mon Nov 13 18:36:39 UTC 2006
On 14 November 2006 (Tue) 00:00, Matt Mackall wrote:
> > I actually borrowed most of the code from
> > http://www.selenic.com/mercurial/bts/issue156. :-) Having two different
> > encodings is nessessary on Windows and maybe on other esotheric systems.
> > And 'stdio_encoding' option could be useful if autodetection of encoding
> > fails.
>
> Just because Windows does it doesn't mean its useful. What are the
> scenarios where Windows needs it?
Although Windows claims to be Unicode-aware, it still uses 8-bit encodings
everywhere. :) For example, my investigations of Windows intallation with
Russian locale showed that notepad.exe produces text file in Windows-1251
encoding. The same encoding is used for command line arguments passed to
Python scripts. And it is the encoding returned by
locale.getpreferredencoding(). But at the same time Windows uses CP866
(legacy Cyrillic encoding from DOS days) for console IO, probably for
compatibility with old DOS apps. It means, sys.stdin.read() returns byte
strings in CP866 encoding, and sys.stdout.write() requires its arguments to
be encoded in CP866. If we just use locale.getpreferredencoding() for that,
non-latin log messages and other texts will be displayed incorrectly when
written to stdout. So we really have to use differrent encoding for stdio.
And better make it user-overridable, because no one knows what other quirks
Windows has. :) Well, for me Windows support is not of great importance, but
it is still nice to have.
> > It would be nice indeed to move that code to util.py, but it needs access
> > to config, and for some reason config loading is done in ui.py (I'd
> > personally prefer having separate config.py module and read config file
> > on first module import). Could someone comment on this?
>
> I'm not yet convinced locale support needs access to the config. If
> the average internationalized app needed its own config tweaks,
> everyone would just give up and use ASCII.
As I noticed, ASCII-speakers tend to underestimate the importance of proper
support for other encodings. ;) Autodetection will probably work most of the
time, but not always, and those config options could be really helpful for
manually resolving the most complex cases. :)
Andrey
More information about the Mercurial-devel
mailing list