[PATCH 1 of 3 RFC] mercurial: implement a source transforming module loader on Python 3
Simon King
simon at simonking.org.uk
Mon May 16 15:43:47 UTC 2016
I don't think that's supposed to happen, is it? Python should
automatically invalidate .pyc files based on a magic number that
changes when the format changes:
https://hg.python.org/cpython/file/2.7/Python/import.c#l31
Simon
On Mon, May 16, 2016 at 4:31 PM, timeless <timeless at gmail.com> wrote:
> Fwiw, We already need some cache invalidation. Switching between Python 2.6
> and 2.7 results in really bad outcomes. :)
>
> On May 16, 2016 12:03 AM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1463370916 25200
>> # Sun May 15 20:55:16 2016 -0700
>> # Node ID 7c5d1f8db9618f511f40bc4089145310671ca57b
>> # Parent f8b87a779c87586aa043bcd6030369715edfc9c1
>> mercurial: implement a source transforming module loader on Python 3
>>
>> The most painful part of ensuring Python code runs on both Python 2
>> and 3 is string encoding. Making this difficult is that string
>> literals in Python 2 are bytes and string literals in Python 3 are
>> unicode. So, to ensure consistent types are used, you have to
>> use "from __future__ import unicode_literals" and/or prefix literals
>> with their type (e.g. b'foo' or u'foo').
>>
>> Nearly every string in Mercurial is bytes. So, to use the same source
>> code on both Python 2 and 3 would require prefixing nearly every
>> string literal with "b" to make it a byte literal. This is ugly and
>> not something mpm is willing to do.
>>
>> This patch implements a custom module loader on Python 3 that performs
>> source transformation to convert string literals (unicode in Python 3)
>> to byte literals. In effect, it changes Python 3's string literals to
>> behave like Python 2's.
>>
>> The module loader is only used on mercurial.* and hgext.* modules.
>>
>> The loader works by tokenizing the loaded source and replacing
>> "string" tokens if necessary. The modified token stream is
>> untokenized back to source and loaded like normal. This does add some
>> overhead. However, this all occurs before caching. So .pyc files should
>> cache the version with byte literals.
>>
>> This patch isn't suitable for checkin. There are a few deficiencies,
>> including that changes to the loader won't result in the cache
>> being invalidated. As part of testing this, I've had to manually
>> blow away __pycache__ directories. We'll likely need to hack up
>> cache checking as well so caching is invalidated when
>> mercurial/__init__.py changes. This is going to be ugly.
>>
>> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
>> --- a/mercurial/__init__.py
>> +++ b/mercurial/__init__.py
>> @@ -139,14 +139,89 @@ class hgimporter(object):
>> if not modinfo:
>> raise ImportError('could not find mercurial module %s' %
>> name)
>>
>> mod = imp.load_module(name, *modinfo)
>> sys.modules[name] = mod
>> return mod
>>
>> +if sys.version_info[0] >= 3:
>> + from . import pure
>> + import importlib
>> + import io
>> + import token
>> + import tokenize
>> +
>> + class hgpathentryfinder(importlib.abc.PathEntryFinder):
>> + """A sys.meta_path finder."""
>> + def find_spec(self, fullname, path, target=None):
>> + # Our custom loader rewrites source code and Python code
>> + # that doesn't belong to Mercurial doesn't expect this.
>> + if not fullname.startswith(('mercurial.', 'hgext.')):
>> + return None
>> +
>> + # This assumes Python 3 doesn't support loading C modules.
>> + if fullname in _dualmodules:
>> + stem = fullname.split('.')[-1]
>> + fullname = 'mercurial.pure.%s' % stem
>> + target = pure
>> + assert len(path) == 1
>> + path = [os.path.join(path[0], 'pure')]
>> +
>> + # Try to find the module using other registered finders.
>> + spec = None
>> + for finder in sys.meta_path:
>> + if finder == self:
>> + continue
>> +
>> + spec = finder.find_spec(fullname, path, target=target)
>> + if spec:
>> + break
>> +
>> + if not spec:
>> + return None
>> +
>> + if fullname.startswith('mercurial.pure.'):
>> + spec.name = spec.name.replace('.pure.', '.')
>> +
>> + # TODO need to support loaders from alternate specs, like zip
>> + # loaders.
>> + spec.loader = hgloader(spec.name, spec.origin)
>> + return spec
>> +
>> + def replacetoken(t):
>> + if t.type == token.STRING:
>> + s = t.string
>> +
>> + # If a docstring, keep it as a string literal.
>> + if s[0:3] in ("'''", '"""'):
>> + return t
>> +
>> + if s[0] not in ("'", '"'):
>> + return t
>> +
>> + # String literal. Prefix to make a b'' string.
>> + return tokenize.TokenInfo(t.type, 'b%s' % s, t.start, t.end,
>> t.line)
>> +
>> + return t
>> +
>> + class hgloader(importlib.machinery.SourceFileLoader):
>> + """Custom module loader that transforms source code.
>> +
>> + When the source code is converted to code, we first transform
>> + string literals to byte literals using the tokenize API.
>> + """
>> + def source_to_code(self, data, path):
>> + buf = io.BytesIO(data)
>> + tokens = tokenize.tokenize(buf.readline)
>> + data = tokenize.untokenize(replacetoken(t) for t in tokens)
>> + return super(hgloader, self).source_to_code(data, path)
>> +
>> # We automagically register our custom importer as a side-effect of
>> loading.
>> # This is necessary to ensure that any entry points are able to import
>> # mercurial.* modules without having to perform this registration
>> themselves.
>> -if not any(isinstance(x, hgimporter) for x in sys.meta_path):
>> - # meta_path is used before any implicit finders and before sys.path.
>> - sys.meta_path.insert(0, hgimporter())
>> +if sys.version_info[0] >= 3:
>> + sys.meta_path.insert(0, hgpathentryfinder())
>> +else:
>> + if not any(isinstance(x, hgimporter) for x in sys.meta_path):
>> + # meta_path is used before any implicit finders and before
>> sys.path.
>> + sys.meta_path.insert(0, hgimporter())
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
More information about the Mercurial-devel
mailing list