Heptapod CI: OSUOSL Linux runners transition and sharding

Georges Racinet georges.racinet at octobus.net
Sat Jun 18 11:22:32 UTC 2022


Dear Mercurial developer

I want to point out that the `osuosl-xy-docker-x86_64` shared runners of
foss.heptapod.net are in the process of being replaced by vastly
different new systems at OSUOSL. I won't delve too much in detail here,
you are invited to follow the tracking issue [1]

The Mercurial project is a big user of these runners, and perhaps the
most delicate to handle, as its test suite requires many cores / virtual
CPUs for the tests to conclude in acceptable time. We are doing our
best, but we won't know how it fares until we try the new runners for
good. It is possible that the sharding strategy outlined below will
become necessary.

Performance evaluation should occur within the next week.

Please don't be surprised if there are slow runs and check what is
happening on the tracking issue [1].

= CI sharding =

The general trends in continuous integration since the likes of Travis
arose has been to favor lots of smaller running systems rather than a
few big ones. This allows for a higher density and less locking of
resources. Many projects with large test suites have implemented a
sharding strategy, in which each tests run is split in as many as
necessary to run on small systems. The industry seems to have settled
onto 2vCPUs per job (this is the standard in GitHub actions, gitlab.com
shared runners, BitBucket, etc.).

For instance, I am running the Ruby tests of Heptapod with up to 28 jobs
and its functional tests with 3 jobs per variant (e.g, native Mercurial,
hg-git based…).

I highly recommend that the Mercurial project implements such a sharding
strategy. Without it, we are forced to provision runners with many
vCPUs, and that means that any single-threaded job of other projects
will needlessly block lots of cores [3].

Some test runners [2] actually gather the execution times at the finest
possible granularity possible, in order to compute shards that are
expected to run in roughly equal times. There is much more to gain in my
humble opinion by starting with a simple modulo on a numeric test ID,
because the statistics-based balancing means is a time-consuming effort
and doesn't work so well on heterogeneous swarms of runners like ours.

Best,

[1] https://foss.heptapod.net/heptapod/foss.heptapod.net/-/issues/137.

[2] Notably, in Ruby realm, this is what Knapsack does. Advanced CI
definitions like GitLab's own pipeline have stages to update the
statistics automatically. Setting this up is a serious effort, even if a
test runner that implements it is available. By contrast, pytest-shard
is much more primitive (a simple modulo), but that is good enough for
Heptapod's functional tests.

[3] AS of today, we have a maximum concurrency of 2 jobs per runner on
foss.h.n shared runners. On the machines that are about to be
decommisioned, that means reserving a whole CPU with 8 cores, even for
single-threaded jobs.

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




More information about the Mercurial-devel mailing list