[PATCH 08 of 14] git: speed up possible head processing during indexing by ~100x

Josef 'Jeff' Sipek jeffpc at josefsipek.net
Thu Jan 2 18:58:30 UTC 2025


 hgext/git/index.py |  24 +++++++++++++++---------
 1 files changed, 15 insertions(+), 9 deletions(-)


# HG changeset patch
# User Josef 'Jeff' Sipek <jeffpc at josefsipek.net>
# Date 1727895686 14400
#      Wed Oct 02 15:01:26 2024 -0400
# Node ID c904a51a704b9f241252dbdd0bacaa3a71c135d9
# Parent  4b50801797ddbbf4cd87849a1506a6073a76e028
git: speed up possible head processing during indexing by ~100x

Benchmarking of 50 iterations of indexing (see below) shows that there is
essentially no difference for small repos (<1k commits), similarly medium
repos (~12k commits) see some benefit but other overheads completely
overwhelm it, but for large repos (~122k commits) the 80-100x speedup is
clearly visible to the user.

All of the numbers are in seconds and were measured with time.time() calls
placed in _index_repo().  The times exclude the time taken by changedfiles
processing.

Small repo (guilt, 553 commits, 1 head):

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
0.0008781 0.0009274 0.0009800 0.0012285 0.0014637 0.0024107 (before)
0.0003092 0.0003281 0.0003519 0.0003777 0.0003927 0.0006843 (after)


Medium repo (hamlib, 12k commits, 53 heads):

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.04881  0.05135  0.07632  0.06672  0.08042  0.09415  (before)
0.004249 0.004420 0.004799 0.004809 0.005051 0.006416 (after)


Large repo (qemu, 122k commits, 50 heads):

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
4.274   4.595   4.832   6.578   8.397   9.721   (before)
0.05180 0.05643 0.05865 0.06130 0.06712 0.06872 (after)

diff --git a/hgext/git/index.py b/hgext/git/index.py
--- a/hgext/git/index.py
+++ b/hgext/git/index.py
@@ -18,7 +18,7 @@ from . import gitutil
 
 pygit2 = gitutil.get_pygit2()
 
-_CURRENT_SCHEMA_VERSION = 1
+_CURRENT_SCHEMA_VERSION = 2
 _SCHEMA = (
     """
 CREATE TABLE refs (
@@ -35,6 +35,8 @@ CREATE TABLE possible_heads (
   node TEXT NOT NULL
 );
 
+CREATE UNIQUE INDEX possible_heads_idx ON possible_heads(node);
+
 -- The topological heads of the changelog, which hg depends on.
 CREATE TABLE heads (
   node TEXT NOT NULL
@@ -331,14 +333,18 @@ def _index_repo(
                 )
     db.execute('DELETE FROM heads')
     db.execute('DELETE FROM possible_heads')
-    for hid in possible_heads:
-        h = hid.hex
-        db.execute('INSERT INTO possible_heads (node) VALUES(?)', (h,))
-        haschild = db.execute(
-            'SELECT COUNT(*) FROM changelog WHERE p1 = ? OR p2 = ?', (h, h)
-        ).fetchone()[0]
-        if not haschild:
-            db.execute('INSERT INTO heads (node) VALUES(?)', (h,))
+    db.executemany('INSERT INTO possible_heads (node) VALUES(?)',
+        [ (hid.hex,) for hid in possible_heads ]
+    )
+    db.execute('''
+    INSERT INTO heads (node)
+        SELECT node FROM possible_heads WHERE
+            node NOT IN (
+                SELECT DISTINCT possible_heads.node FROM changelog, possible_heads WHERE
+                    changelog.p1 = possible_heads.node OR
+                    changelog.p2 = possible_heads.node
+            )
+    ''')
 
     db.commit()
     if prog is not None:



More information about the Mercurial-devel mailing list