[Updated] D11204: hgwebdir: avoid systematic full garbage collection
av6 (Anton Shestakov)
phabricator at mercurial-scm.org
Tue Jul 20 19:48:12 UTC 2021
av6 added a comment.
Thank you for caring about hgweb, it doesn't get this treatment often.
> It is possible, though, that a value like 100 or 1000 could be a good trade-off if someone has a repository for which the GC can actually mitigate excessive memory usage.
I feel that you're downplaying the problem. The original ff2370a70fe8 <https://phab.mercurial-scm.org/rHGff2370a70fe8ef4e7943a23cc1688d5c762762b7> states that every raw-file request to e.g. firefox repo leaks ~100 MB per request, and I don't think people would like to have *each* hgwebdir process get to 10-100 GB before it gets gc'd.
Here's how to check if the issue is still present in the current code on python3:
hg serve --web-conf=foo.conf
[paths]
/ = /path/to/repos/*
I'm going to use hg-committed as an example repo because it's reasonably sized and readily available. Just browsing around in an hg-committed clone locally makes the hg serve process to quickly grow to 1 G rss over practically nothing (first page of log, directory listing, tags, branches, etc). Grows by 100 to 300 MB per request. Now, I know hgweb is supposed to serve only actual generated content from the repo and we're here making it serve static files as well, but this memory leaking behavior depends on the way hgweb is deployed, and even in perfect setups this problem can manifest itself if e.g. the WSGI runner decides to use multiple threads or adjust gc frequency (or if a random spider starts hitting all the URLs on the server). I haven't actually figured out what makes hgweb in gunicorn leak, even though it shouldn't be multithreaded, I think? It was a long time ago, but I remember that gc.collect() at least made hgweb processes manageable for a small vps.
> desireable long-term objects, typically in caches. This is an area of interest of mine.
When I looked at why hg-committed repo takes so much memory, the biggest consumer by far was obsstore and its cached attributes. obsstore is not only fitting the entirety of .hg/store/obsstore (hundreds of thousands of obsmarkers) into memory in a not very memory-friendly format, it's doing it multiple times. obsstore.successors and obsstore.predecessors basically contain the same obsmarkers, just reorganized for different uses. They all use basic python structures. This takes about 300 MB of memory for every instance of hg-committed repo. And if you create an instance of that for every request, you get crazy memory consumption way before python figures out that maybe it should collect some unused objects.
In fact, python does eventually collect garbage on its own, but it takes like 15 requests. So full-garbage-collection-rate of 100 (let alone 1000) doesn't change anything, since the process will either collect on its own, or it'll grow in size so much that it gets a visit from OOM killer.
REPOSITORY
rHG Mercurial
BRANCH
default
CHANGES SINCE LAST ACTION
https://phab.mercurial-scm.org/D11204/new/
REVISION DETAIL
https://phab.mercurial-scm.org/D11204
To: gracinet, #hg-reviewers, pulkit
Cc: av6, mercurial-patches
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-patches/attachments/20210720/0d4cdd01/attachment-0002.html>
More information about the Mercurial-patches
mailing list