Getting http://mercurial.selenic.com/wiki/FixUtf8Extension as a part of hgsubversion

Mike Meyer mwm at mired.org
Wed Oct 19 16:13:02 UTC 2011


On Wed, Oct 19, 2011 at 8:38 AM, Tom Anderson <tom.anderson at e2x.co.uk>wrote:

> On 19 October 2011 16:24, Cesar Mena <cesar.mena at gmail.com> wrote:
>
> >>> What happens if you check in some files with entirely alphabetical
> >>> names on your Latin-1 box, and i check them out on my EBCDIC machine?
> >>
> >> Definitely agreed, the UTF8 encoding act as a intermediate Encoding for
> >> iterchange
> >> between different Operating System is the only solution to
> Mercurial(There
> >> is no other
> >> replacement or be more difficult, such as UTF16, UTF32, no one will
> >> agreed).
> >> The position of UTF8 in Computer Machine world is act as exactly with
> >> English in
> >> the Real world for international communication, likes I talks to you.
> >> That's obviously if I write down Chinese, then rare people will
> understand
> >> me.
> >> By using UTF8, even though you cannot rendering it out, or can not find
> >> corresponding
> >> character under EBCDIC machine or ASCII machine, But at least we storage
> >> it as a
> >> unified Encoding, so we can handle it in the consistent way.
> >
> > but then what happens to those users of mercurial that choose/need to
> work
> > on ASCII or EBCDIC machines?
> Sadly, Luo's filename *cannot* be used on those machines. That is the
> simple truth.
>

But it's *not* true. They can be used if you treat file names as a string of
bytes instead of characters. They just don't display with any meaning. The
file name *cannot* be used if it includes a character that the underlying
platform won't allow in file names. The two cases are different.


> So, Mercurial has to tell them that. It's not a nice thing to hear,
> but it's the truth.
>
> Perhaps it could let people override that, by specifying some sort of
> mapping strategy. I don't know.
>
> What is *not* okay is for Mercurial to just splurge nonsense filenames
> over their disks.
>

That's a value judgement. Personally, I think hg makes the right choice.
There are enough situations where file names are nonsense that adding one
more is a relatively minor issue.

Since the consensus seems to be that UTF support would break the ability to
push changes to an old version, possibly the project needs to fork?

    <mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20111019/18235c07/attachment-0002.html>


More information about the Mercurial mailing list