Initial support of Unicode filenames

Antoine Pitrou solipsis at pitrou.net
Thu Nov 3 12:06:53 UTC 2011


On Thu, 03 Nov 2011 10:31:28 +0100
Martin Geisler <mg at aragost.com> wrote:
> 
> That's the point: some people think it's better to checkout files with
> broken filenames instead of refusing to checkout.

The filenames weren't "broken" in the first place when they were
checked in, it's just that (IIUC) hg prefers to faithfully copy
their bytes representation rather than transcode them to a
locale-agnostic representation.

The filenames may be unrepresentable in the reader's locale, though.
There may be solutions to that. Not necessarily pretty ones - they
probably involve a funky escaping algorithm -, but the current
situation isn't pretty either: in both cases, the user gets a
misrepresented filename (aka mojibake).

What seems to be the design point is that filenames stay the same in
their binary representation (so that Makefiles aren't broken, I
suppose). Of course, that prevents any proper unicode support if
computers with different encodings are involved.
(and "unicode" really means "the user can read the checked out
filenames")

> Today, a Windows user
> can commit a file named "Sweet crêpe recipe.txt" and I can checkout the
> file on my Linux machine. I won't get a "ê" in my filename, but I'll get
> a file I can modify and commit changes to anyway.

But proper unicode support /would/ get you a "ê" in the latin1 filename.
(you may have to use the wide APIs under Windows, if "ê" isn't in your
local code page)

Regards

Antoine.





More information about the Mercurial-devel mailing list