Status of internationalization
Kevin Smith
yarcs at qualitycode.com
Sun Jun 12 14:09:22 UTC 2005
vseguip at gmail.com wrote:
> Ok, this is more of an UI issue. If we have binary adding as default,
> hg behaves as you want. Then we can add a text flag "hg add -t
> different.encoding" if we want charset transformation.
This would be ok with me. As long as the user must explicitly request
encoding or newline transformations on specific files, it should be
safe, and those of us who don't need or want it would be unaffected.
> I don't know about other tools but svn has eol conversion (optional
> like I pointed above).
I haven't paid much attention to svn because I'm sold on the distributed
model. It looks like my perceptions have also been biased by watching
SCM projects very early in their lifecycle, before they have yet reached
a decision on whether or not to mangle newlines (the state mercurial is
now in). So as of today, eol conversion support is rare, but I was wrong
to imply that there is a strong long-term trend in that direction.
Here's a quick survey of distributed SCM eol handling:
monotone: performs both eol and charset conversions on files, triggered
by a lua hook. See the link below for details. I think this might be a
good model for mercurial, since they have obviously put a lot of thought
into it.
http://www.venge.net/monotone/docs/Internationalization.html
darcs: distinguishes between text and binary files, defaulting to text
unless the file matches a mask in a config file. However, it's not
actually clear that text files do or don't have newline conversions
performed. The primary difference is that binary files do not store deltas.
ArX: No conversions yet. Documented possibility of later supporting
svn-like functionality
codeville: does not mention eol conversion (yet)
bazaar-ng (bzr): No mention of eol conversions (yet)
GNU arch and git/cogito: I can't find anything in the arch docs (and
their wiki is down right now). I don't recall seeing any discussions on
the git list about doing newline conversions. Since both are very
unix-oriented, I wouldn't expect them to support newline conversions,
except possibly as an external hook.
> I think we all agree that this can be ugly from
> a theroretical pov, but I think that for many environments, ignoring
> charsets/eol will just not cut it.
I suppose you're right. These days, most editors are able to write
linefeed-only files, and most Windows compilers (including VC++) will
work fine with linefeed-only files. I think there are a few Windows
tools (like nmake?) that insanely still require carriage returns.
Although most of my work is on cross-platform projects, I personally
don't have a need for newline conversions in the SCM. But if you know of
real-world cases where it's needed, then I suppose some kind of support
makes sense.
By the way, I strongly support the use of gettext, or some other "po"
file format compatible method of supporting translated UI's. Although
the mechanism is somewhat confusing and bloated, it is an effective and
ubiquitous standard. Using anything else would make it much more
difficult for the translation community to translate mercurial to lots
of languages.
Still another note: You (Matt) should probably think about
case-insensitive file systems, such as are found on MS Windows. Darcs
does a pretty good job with this, defaulting to treating filenames that
are the same except for case as being conflicting, but allowing that to
be overridden with a command-line switch.
Kevin
More information about the Mercurial
mailing list