[Reviewers] [PATCH rfc] rfc: call gc at exit of mercurial

Gregory Szorc gregory.szorc at gmail.com
Wed May 4 06:51:20 UTC 2016



> On Apr 4, 2016, at 23:37, Maciej Fijalkowski <fijall at gmail.com> wrote:
> 
> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org> wrote:
>> 
>> 
>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>>> 
>>> class A(object):
>>>     def __del__(self):
>>>         print "del"
>>> 
>>> class B(object):
>>>     pass
>>> 
>>> b = B()
>>> b.b = b
>>> b.a = A()
>>> 
>>> 
>>> This example does not call __del__ in CPython either.
>>> 
>>> The __del__ is not guaranteed to be called - that's why there is a
>>> painful module finalization procedure where CPython is trying to call
>>> "as much as possible", but there are still no guarantees. If you add
>>> del b; gc.collect() you will see "del" printed. Of course this
>>> involves a cycle, but cycles can come in ways that you don't expect
>>> them and PyPy simply says "everything is GCed". I think it's very much
>>> in line with what python-dev thinks.
>> 
>> 
>> Which is why we have __del__ in very few object and we deploy massive effort
>> to ensure their don't get caught in cycle and mostly succeeding at this.
>> (Kind of the same we put a lot of effort into making sure __del__ are never
>> really called but keep them as double safety).
>> 
>> So in the case we care about (no cycle) Cpython would call our __del__,
>> right?
>> 
>> --
>> Pierre-Yves David
> 
> Yes, but I would argue you can create cycles without knowing. E.g.
> 
> def f():
>    try:
>       some_stuff
>    except:
>       x = sys.exc_info()
> 
> creates a cycle. There are also ways to create cycles with passing
> global functions around etc.

This.

We have plenty of cycles in our code. We just don't notice them very often because "hg" processes are short-lived. And what's worse is we don't know we're introducing them unless we go looking for them, often after someone complains about a leak on a large repo.

If you want to create cycles and leak memory, I recommend "hg convert" on thousands of revisions with extensions and hooks installed. Or start a WSGI server.

One of the reasons I want to get Python 3 support is so we can use its tracemalloc module to help debug leaks. The Python 2 tools for finding cycles and leaks (such as guppy and heappy) are a bit harder to use and to integrate into our testing harness.


More information about the Reviewers mailing list