Managing multiple encodings in one repository
David Rushby
davidrushby at gmail.com
Mon Apr 9 05:38:26 UTC 2007
On 4/6/07, David Rushby <davidrushby at gmail.com> wrote:
> > > >> 2) Be able to see encoding-normalized output from commands that
> > > >> might operate on files with different encodings.
> >
> > See the encode and decode filters:
> >
> > http://www.selenic.com/mercurial/wiki/index.cgi/EncodeDecodeFilter
>
> Thanks for the advice. I'll delve into these, because the
> _fallbackencoding hack didn't help. Unfortunately, it really is
> necessary for me to work with source files in multiple encodings.
Mercurial now works perfectly for me. The goal of normalizing all
textual input to UTF-8, so that Mercurial can show reasonable output
for all textual encodings, has been achieved (in my specific scenario,
at least).
Here's how things are currently arranged:
- The HGENCODING environment variable is set to "utf8".
- Mercurial.ini is saved in UTF-8.
- I wrote an encoding filter that converts Python files from their
external encoding (specified in the "#-*- coding: encodingname -*-"
header) to UTF-8. Even though I work with source files written in
many languages other than Python, the same sort of scheme can be
applied to them.
- I wrote a decoding filter that does the inverse conversion.
- I wrote a custom hgmerge replacement that converts from the external
encoding to UTF-8 before passing the files to kdiff3, then back to the
external encoding after kdiff3 is finished. kdiff3 can therefore be
set to use UTF-8 all the time, yet it can operate on text that is
stored externally in any encoding.
- "hg serve", hgk, and the PyGTK-based implementation of hgk all work
perfectly, because everything they consume is in UTF-8.
- cmd.exe does not work well at all with UTF-8, even with "chcp
65001". Since I only need to operate on English and Russian text, I
wrote a wrapper for the 'hg' command that overrides sys.stdout with a
Windows1251-encoding output stream created via
codecs.lookup('cp1251')[3]. I can therefore use cmd.exe in "chcp
1251" mode, which actually works.
Thanks very much for your advice, Matt.
More information about the Mercurial
mailing list