D11204: hgwebdir: avoid systematic full garbage collection
gracinet (Georges Racinet)
phabricator at mercurial-scm.org
Tue Jul 20 16:19:29 UTC 2021
gracinet created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.
REVISION SUMMARY
Forcing a systematic full garbage collection upon each request
can serioulsy harm performance. This is reported as
https://bz.mercurial-scm.org/show_bug.cgi?id=6075
With this change we're performing the full collection according
to a new setting, `web.full-garbage-collection-rate`.
The default value is 0, which means not to force any
full garbage collection.
In my limited and biased towards Python3 experience chasing memory
leaks in Mercurial servers, the full collection never
made a difference in terms of memory footprint. It is possible that
the condition that it was mitigating when it was introduced is not
there anymore, or that it is specific to some repositories.
On the other hand, as explained in the Python developer docs [1],
frequent full collections are very harmful in terms of performance if
lots of objects survive the collection, and hence stay in the
oldest generation. Note that `gc.collect()` is indeed trying to
collect the oldest generation [2]. This happens usually in two cases:
- unwanted lingering objects (i.e., an actual memory leak that the GC cannot do anything about). Sadly, we have lots of those these days.
- desireable long-term objects, typically in caches. This is an area of interest of mine.
In short, the flat rate that this change still permits is
probably a bad idea in most cases, which is why it is set to zero
by default. It is possible, though, that a value like 100 or 1000
could be a good trade-off if someone has a repository for which the GC
can actually mitigate excessive memory usage.
The test is inspired from test-hgwebdir-paths.py
[1] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation
[2] https://docs.python.org/3/library/gc.html#gc.collect
REPOSITORY
rHG Mercurial
BRANCH
default
REVISION DETAIL
https://phab.mercurial-scm.org/D11204
AFFECTED FILES
mercurial/configitems.py
mercurial/hgweb/hgwebdir_mod.py
tests/test-hgwebdir-gc.py
CHANGE DETAILS
diff --git a/tests/test-hgwebdir-gc.py b/tests/test-hgwebdir-gc.py
new file mode 100644
--- /dev/null
+++ b/tests/test-hgwebdir-gc.py
@@ -0,0 +1,44 @@
+from __future__ import absolute_import
+
+import os
+from mercurial.hgweb import hgwebdir_mod
+
+hgwebdir = hgwebdir_mod.hgwebdir
+
+os.mkdir(b'webdir')
+os.chdir(b'webdir')
+
+webdir = os.path.realpath(b'.')
+
+
+def trivial_response(req, res):
+ return []
+
+
+def make_hgwebdir(gc_rate):
+ config = os.path.join(webdir, b'hgwebdir.conf')
+ with open(config, 'wb') as configfile:
+ configfile.write(b'[web]\n')
+ configfile.write(b'full-garbage-collection-rate=%d\n' % gc_rate)
+
+ hg_wd = hgwebdir(config)
+ hg_wd._runwsgi = trivial_response
+ return hg_wd
+
+
+def process_requests(webdir_instance, number):
+ # we don't care for now about passing realistic arguments
+ for _ in range(number):
+ for chunk in webdir_instance.run_wsgi(None, None):
+ pass
+
+
+without_gc = make_hgwebdir(gc_rate=0)
+process_requests(without_gc, 5)
+assert without_gc.requests_count == 5
+assert without_gc.gc_full_collections_done == 0
+
+with_gc = make_hgwebdir(gc_rate=2)
+process_requests(with_gc, 5)
+assert with_gc.requests_count == 5
+assert with_gc.gc_full_collections_done == 2
diff --git a/mercurial/hgweb/hgwebdir_mod.py b/mercurial/hgweb/hgwebdir_mod.py
--- a/mercurial/hgweb/hgwebdir_mod.py
+++ b/mercurial/hgweb/hgwebdir_mod.py
@@ -285,6 +285,7 @@
self.lastrefresh = 0
self.motd = None
self.refresh()
+ self.requests_count = 0
if not baseui:
# set up environment for new ui
extensions.loadall(self.ui)
@@ -341,6 +342,10 @@
self.repos = repos
self.ui = u
+ self.gc_full_collect_rate = self.ui.configint(
+ b'web', b'full-garbage-collection-rate'
+ )
+ self.gc_full_collections_done = 0
encoding.encoding = self.ui.config(b'web', b'encoding')
self.style = self.ui.config(b'web', b'style')
self.templatepath = self.ui.config(
@@ -383,12 +388,25 @@
finally:
# There are known cycles in localrepository that prevent
# those objects (and tons of held references) from being
- # collected through normal refcounting. We mitigate those
- # leaks by performing an explicit GC on every request.
- # TODO remove this once leaks are fixed.
- # TODO only run this on requests that create localrepository
- # instances instead of every request.
- gc.collect()
+ # collected through normal refcounting.
+ # In some cases, the resulting memory consumption can
+ # be tamed by performing explicit garbage collections.
+ # In presence of actual leaks or big long-lived caches, the
+ # impact on performance of such collections can become a
+ # problem, hence the rate shouldn't be set too low.
+ # See "Collecting the oldest generation" in
+ # https://devguide.python.org/garbage_collector
+ # for more about such trade-offs.
+ rate = self.gc_full_collect_rate
+
+ # this is not thread safe, but the consequence (skipping
+ # a garbage collection) is arguably better than risking
+ # to have several threads perform a collection in parallel
+ # (long useless wait on all threads).
+ self.requests_count += 1
+ if rate > 0 and self.requests_count % rate == 0:
+ gc.collect()
+ self.gc_full_collections_done += 1
def _runwsgi(self, req, res):
try:
diff --git a/mercurial/configitems.py b/mercurial/configitems.py
--- a/mercurial/configitems.py
+++ b/mercurial/configitems.py
@@ -2486,6 +2486,11 @@
)
coreconfigitem(
b'web',
+ b'full-garbage-collection-rate',
+ default=0, # means not to force full collections
+)
+coreconfigitem(
+ b'web',
b'hidden',
default=False,
)
To: gracinet, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
More information about the Mercurial-devel
mailing list