A proposal on solve encoding problem on Windows.

Matt Mackall mpm at selenic.com
Thu Oct 20 16:49:33 UTC 2011


On Fri, 2011-10-21 at 00:04 +0800, 罗勇刚(Yonggang Luo) wrote:
> Things need to be done:
> 1. all pure-ASCII repostiroy won't be affected. this is easy
> 2. all OS except Windows won't be affected. using sys.platform == 'win32' to
> ensure that.
> 2. use utf8 as the default encoding for new commits.

This is not sufficiently backwards-compatible. Alice using Mercurial 2.1
checks in a new file to the existing project, Bob and Carl and Dave and
Erica using Mercurial 1.8 can't check it out.

For many users, this would be a serious regression: Windows users using
the same SBCS code page can already share files just fine.

> 4. supporting for messed up old mercurial repository. add new --encoding
> option to do that.

Same problem. A solution that breaks things for existing users will not
be considered.


Here's an alternate scheme:

if windows:
  find manifest of the parent commit
  if manifest is empty:
    # brand new repo
    mode = utf8transcoding
  if all files in manifest are valid UTF-8:
    # repo is already in UTF-8 mode or is pure ASCII
    mode = utf8transcoding
  else:
    # existing repo, possibly using a Windows character set
    mode = passthrough
else:
  mode = passthrough

Notes:
1. We can reliably detect UTF-8 with very high probability
2. This automatically does the right thing on existing repos
3. This automatically does the right thing when working with Linux users
on UTF-8
4. Existing repos can be upgraded to UTF-8 if desired

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial mailing list