[PATCH 3 of 4 V2] revset: speedup matching() by first matching fields that take less time to

Angel Ezquerra angel.ezquerra at gmail.com
Sat Apr 14 22:33:57 UTC 2012


# HG changeset patch
# User Angel Ezquerra <angel.ezquerra at gmail.com>
# Date 1334360463 -7200
# Node ID eb13533a7849604864f42fb7eae3d8f11f0e7ea7
# Parent  effd8cdec977cddc33654cfc1a60bfbdd8d77af4
revset: speedup matching() by first matching fields that take less time to
match

This patch sorts the fields that are passed to the matching function so that it
always starts by matching those fields that take less time to match.

Not all fields take the same amount of time to match. I've done several
measurements running the following command:

hg --time log -r "matching(1, field)"

on the mercurial repository, and where 'field' was each one of the fields
accepted by match. In order to avoid the print overhead (which could be
different for different fields, given the different number of matches) I used a
modified version of the matching() function which always returns no matches.

These tests showed that different fields take wildly different amounts of time
to match. Particulary the substate field takes up to 25 seconds to match on my
machine, compared to the 0.3 seconds that takes to match the phase field or the
2 seconds (approx) that takes to match most fields. With this patch, matching
both the phase and the substate of a revision takes the same amount of time as
matching the phase.

The field match order introduced by this patch is as follows:

phase, parents, user, date, branch, summary, files, description, substate

An extra nice thing about this patch is that it makes the match time stable.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -950,6 +950,19 @@
         fields.discard('summary')
 
     # We may want to match more than one field
+    # Not all fields take the same amount of time to be matched
+    # Sort the selected fields in order of increasing matching cost
+    fieldorder = ('phase', 'parents', 'user', 'date', 'branch', 'summary',
+        'files', 'description', 'substate',)
+    def fieldkeyfunc(f):
+        try:
+            return fieldorder.index(f)
+        except:
+            # assume an unknown field is very costly
+            return len(fieldorder)
+    fields = list(fields)
+    fields.sort(key=fieldkeyfunc)
+
     # Each field will be matched with its own "getfield" function
     # which will be added to the getfieldfuncs array of functions
     getfieldfuncs = []



More information about the Mercurial-devel mailing list