[PATCH 1 of 3 RFC] mercurial: implement a source transforming module loader on Python 3
Gregory Szorc
gregory.szorc at gmail.com
Mon May 16 15:50:44 UTC 2016
> On May 16, 2016, at 08:43, Simon King <simon at simonking.org.uk> wrote:
>
> I don't think that's supposed to happen, is it? Python should
> automatically invalidate .pyc files based on a magic number that
> changes when the format changes:
>
> https://hg.python.org/cpython/file/2.7/Python/import.c#l31
The problem is we're inserting code between file reading and code generation - code that Python's importers don't know about. Python assumes file content gets converted into code in a deterministic manner that is defined by the Python distribution itself. We're outside of that, so we need to provide our own cache validation checking.
>
>> On Mon, May 16, 2016 at 4:31 PM, timeless <timeless at gmail.com> wrote:
>> Fwiw, We already need some cache invalidation. Switching between Python 2.6
>> and 2.7 results in really bad outcomes. :)
>>
>>> On May 16, 2016 12:03 AM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>>>
>>> # HG changeset patch
>>> # User Gregory Szorc <gregory.szorc at gmail.com>
>>> # Date 1463370916 25200
>>> # Sun May 15 20:55:16 2016 -0700
>>> # Node ID 7c5d1f8db9618f511f40bc4089145310671ca57b
>>> # Parent f8b87a779c87586aa043bcd6030369715edfc9c1
>>> mercurial: implement a source transforming module loader on Python 3
>>>
>>> The most painful part of ensuring Python code runs on both Python 2
>>> and 3 is string encoding. Making this difficult is that string
>>> literals in Python 2 are bytes and string literals in Python 3 are
>>> unicode. So, to ensure consistent types are used, you have to
>>> use "from __future__ import unicode_literals" and/or prefix literals
>>> with their type (e.g. b'foo' or u'foo').
>>>
>>> Nearly every string in Mercurial is bytes. So, to use the same source
>>> code on both Python 2 and 3 would require prefixing nearly every
>>> string literal with "b" to make it a byte literal. This is ugly and
>>> not something mpm is willing to do.
>>>
>>> This patch implements a custom module loader on Python 3 that performs
>>> source transformation to convert string literals (unicode in Python 3)
>>> to byte literals. In effect, it changes Python 3's string literals to
>>> behave like Python 2's.
>>>
>>> The module loader is only used on mercurial.* and hgext.* modules.
>>>
>>> The loader works by tokenizing the loaded source and replacing
>>> "string" tokens if necessary. The modified token stream is
>>> untokenized back to source and loaded like normal. This does add some
>>> overhead. However, this all occurs before caching. So .pyc files should
>>> cache the version with byte literals.
>>>
>>> This patch isn't suitable for checkin. There are a few deficiencies,
>>> including that changes to the loader won't result in the cache
>>> being invalidated. As part of testing this, I've had to manually
>>> blow away __pycache__ directories. We'll likely need to hack up
>>> cache checking as well so caching is invalidated when
>>> mercurial/__init__.py changes. This is going to be ugly.
>>>
>>> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
>>> --- a/mercurial/__init__.py
>>> +++ b/mercurial/__init__.py
>>> @@ -139,14 +139,89 @@ class hgimporter(object):
>>> if not modinfo:
>>> raise ImportError('could not find mercurial module %s' %
>>> name)
>>>
>>> mod = imp.load_module(name, *modinfo)
>>> sys.modules[name] = mod
>>> return mod
>>>
>>> +if sys.version_info[0] >= 3:
>>> + from . import pure
>>> + import importlib
>>> + import io
>>> + import token
>>> + import tokenize
>>> +
>>> + class hgpathentryfinder(importlib.abc.PathEntryFinder):
>>> + """A sys.meta_path finder."""
>>> + def find_spec(self, fullname, path, target=None):
>>> + # Our custom loader rewrites source code and Python code
>>> + # that doesn't belong to Mercurial doesn't expect this.
>>> + if not fullname.startswith(('mercurial.', 'hgext.')):
>>> + return None
>>> +
>>> + # This assumes Python 3 doesn't support loading C modules.
>>> + if fullname in _dualmodules:
>>> + stem = fullname.split('.')[-1]
>>> + fullname = 'mercurial.pure.%s' % stem
>>> + target = pure
>>> + assert len(path) == 1
>>> + path = [os.path.join(path[0], 'pure')]
>>> +
>>> + # Try to find the module using other registered finders.
>>> + spec = None
>>> + for finder in sys.meta_path:
>>> + if finder == self:
>>> + continue
>>> +
>>> + spec = finder.find_spec(fullname, path, target=target)
>>> + if spec:
>>> + break
>>> +
>>> + if not spec:
>>> + return None
>>> +
>>> + if fullname.startswith('mercurial.pure.'):
>>> + spec.name = spec.name.replace('.pure.', '.')
>>> +
>>> + # TODO need to support loaders from alternate specs, like zip
>>> + # loaders.
>>> + spec.loader = hgloader(spec.name, spec.origin)
>>> + return spec
>>> +
>>> + def replacetoken(t):
>>> + if t.type == token.STRING:
>>> + s = t.string
>>> +
>>> + # If a docstring, keep it as a string literal.
>>> + if s[0:3] in ("'''", '"""'):
>>> + return t
>>> +
>>> + if s[0] not in ("'", '"'):
>>> + return t
>>> +
>>> + # String literal. Prefix to make a b'' string.
>>> + return tokenize.TokenInfo(t.type, 'b%s' % s, t.start, t.end,
>>> t.line)
>>> +
>>> + return t
>>> +
>>> + class hgloader(importlib.machinery.SourceFileLoader):
>>> + """Custom module loader that transforms source code.
>>> +
>>> + When the source code is converted to code, we first transform
>>> + string literals to byte literals using the tokenize API.
>>> + """
>>> + def source_to_code(self, data, path):
>>> + buf = io.BytesIO(data)
>>> + tokens = tokenize.tokenize(buf.readline)
>>> + data = tokenize.untokenize(replacetoken(t) for t in tokens)
>>> + return super(hgloader, self).source_to_code(data, path)
>>> +
>>> # We automagically register our custom importer as a side-effect of
>>> loading.
>>> # This is necessary to ensure that any entry points are able to import
>>> # mercurial.* modules without having to perform this registration
>>> themselves.
>>> -if not any(isinstance(x, hgimporter) for x in sys.meta_path):
>>> - # meta_path is used before any implicit finders and before sys.path.
>>> - sys.meta_path.insert(0, hgimporter())
>>> +if sys.version_info[0] >= 3:
>>> + sys.meta_path.insert(0, hgpathentryfinder())
>>> +else:
>>> + if not any(isinstance(x, hgimporter) for x in sys.meta_path):
>>> + # meta_path is used before any implicit finders and before
>>> sys.path.
>>> + sys.meta_path.insert(0, hgimporter())
>>> _______________________________________________
>>> Mercurial-devel mailing list
>>> Mercurial-devel at mercurial-scm.org
>>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
More information about the Mercurial-devel
mailing list