Status of internationalization

Kevin Smith yarcs at qualitycode.com
Sun Jun 12 14:09:22 UTC 2005


vseguip at gmail.com wrote:
> Ok, this is more of an UI issue. If we have binary adding as default,
> hg behaves as you want. Then we can add a text flag "hg add -t
> different.encoding" if we want charset transformation.

This would be ok with me. As long as the user must explicitly request 
encoding or newline transformations on specific files, it should be 
safe, and those of us who don't need or want it would be unaffected.

> I don't know about other tools but svn has eol conversion (optional
> like I pointed above). 

I haven't paid much attention to svn because I'm sold on the distributed 
model. It looks like my perceptions have also been biased by watching 
SCM projects very early in their lifecycle, before they have yet reached 
a decision on whether or not to mangle newlines (the state mercurial is 
now in). So as of today, eol conversion support is rare, but I was wrong 
to imply that there is a strong long-term trend in that direction.

Here's a quick survey of distributed SCM eol handling:

monotone: performs both eol and charset conversions on files, triggered 
by a lua hook. See the link below for details. I think this might be a 
good model for mercurial, since they have obviously put a lot of thought 
into it.
   http://www.venge.net/monotone/docs/Internationalization.html

darcs: distinguishes between text and binary files, defaulting to text 
unless the file matches a mask in a config file. However, it's not 
actually clear that text files do or don't have newline conversions 
performed. The primary difference is that binary files do not store deltas.

ArX: No conversions yet. Documented possibility of later supporting 
svn-like functionality

codeville: does not mention eol conversion (yet)

bazaar-ng (bzr): No mention of eol conversions (yet)

GNU arch and git/cogito: I can't find anything in the arch docs (and 
their wiki is down right now). I don't recall seeing any discussions on 
the git list about doing newline conversions. Since both are very 
unix-oriented, I wouldn't expect them to support newline conversions, 
except possibly as an external hook.

> I think we all agree that this can be ugly from
> a theroretical pov, but I think that for many environments, ignoring
> charsets/eol will just not cut it.

I suppose you're right. These days, most editors are able to write 
linefeed-only files, and most Windows compilers (including VC++) will 
work fine with linefeed-only files. I think there are a few Windows 
tools (like nmake?) that insanely still require carriage returns.

Although most of my work is on cross-platform projects, I personally 
don't have a need for newline conversions in the SCM. But if you know of 
real-world cases where it's needed, then I suppose some kind of support 
makes sense.


By the way, I strongly support the use of gettext, or some other "po" 
file format compatible method of supporting translated UI's. Although 
the mechanism is somewhat confusing and bloated, it is an effective and 
ubiquitous standard. Using anything else would make it much more 
difficult for the translation community to translate mercurial to lots 
of languages.


Still another note: You (Matt) should probably think about 
case-insensitive file systems, such as are found on MS Windows. Darcs 
does a pretty good job with this, defaulting to treating filenames that 
are the same except for case as being conflicting, but allowing that to 
be overridden with a command-line switch.

Kevin



More information about the Mercurial mailing list