[PATCH 1 of 6 foldmap-in-C] encoding: define an enum that specifies what normcase does to ASCII strings
Matt Mackall
mpm at selenic.com
Thu Apr 2 18:22:57 UTC 2015
On Thu, 2015-04-02 at 10:14 +0200, Adrian Buehlmann wrote:
> On 2015-04-02 04:48, Siddharth Agarwal wrote:
> > # HG changeset patch
> > # User Siddharth Agarwal <sid0 at fb.com>
> > # Date 1427872870 25200
> > # Wed Apr 01 00:21:10 2015 -0700
> > # Node ID 42a1040af0c362b38ce45fc71e065d1769902c79
> > # Parent 37a2b446985f2ef77b9690a0548c8630828b7412
> > encoding: define an enum that specifies what normcase does to ASCII strings
> >
> > For C code we don't want to pay the cost of calling into a Python function for
> > the common case of ASCII filenames. However, while on most POSIX platforms we
> > prefer to normalize filenames by lowercasing them, on Windows we uppercase
> > them. We define an enum here indicating the direction that filenames should be
> > normalized as. Some platforms (notably Cygwin) have more complicated
> > normalization behavior -- we add a case for that too.
> >
> > In upcoming patches we'll also define a fallback function that is called if the
> > string has non-ASCII bytes.
> >
> > This enum will be replicated in the C code to make foldmaps. There's
> > unfortunately no nice way to avoid that -- we can't have encoding import
> > parsers because of import cycles. One way might be to have parsers import
> > encoding, but accessing Python modules from C code is just awkward.
> >
> > The name 'normcaseasciispecs' was chosen to indicate that this is merely
> > an integer that specifies a behavior, not a function. The name was pluralized
> > since in upcoming patches we'll introduce 'normcaseasciispec' which will be one
> > of these values.
> >
> > diff --git a/mercurial/encoding.py b/mercurial/encoding.py
> > --- a/mercurial/encoding.py
> > +++ b/mercurial/encoding.py
> > @@ -354,6 +354,19 @@ def upper(s):
> > except LookupError, k:
> > raise error.Abort(k, hint="please check your locale settings")
> >
> > +class normcaseasciispecs(object):
> > + '''what a platform's normcase does to ASCII strings
> > +
> > + This is specified per platform, and should be consistent with what normcase
> > + on that platform actually does.
> > +
> > + lower: normcase lowercases ASCII strings
> > + upper: normcase uppercases ASCII strings
> > + other: the fallback function should always be called'''
> > + lower = -1
> > + upper = 1
> > + other = 0
> > +
> > _jsonmap = {}
> >
> > def jsonescape(s):
>
> Ugh, this sounds ugly.
>
> I guess there is not much chance this surprising difference between
> Mercurial's util.normcase function doing uppercase when run on Windows
> and lowercase when run on other platforms could be eliminated.
No, it's a difference present in the underlying filesystems (NTFS
compares via upper(), HFS+ via lower()). And given that there are a
number of scripts that aren't quite 1:1 upper:lower, there are
differences that appear. There's even a script that has one uppercase
and two lowercases, so it's fully 1:2.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list