[Request] [+- ] D12089: encoding: fix trim() to be O(n) instead of O(n^2)

martinvonz (Martin von Zweigbergk) phabricator at mercurial-scm.org
Wed Jan 26 18:29:56 UTC 2022


martinvonz created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  `encoding.trim()` iterated over the possible lengths smaller than the
  input and created a slice for each. It then calculated the column
  width of the result, which is of course O(n), so the overall algorithm
  was O(n). This patch rewrites it to iterate over the unicode
  characters, keeping track of the length so far. Also, the old
  algorithm started from the end of the string, which made it much worse
  when the input is large and the limit is small (such as the typical 72
  we pass to it).
  
  You can time it by running something like this:
  
    time python3 -c 'from mercurial.utils import stringutil; print(stringutil.ellipsis(b"0123456789" * 1000, 5))'
  
  That drops from 4.05 s to 83 ms with this patch (and most of that is
  of course startup time).

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D12089

AFFECTED FILES
  mercurial/encoding.py

CHANGE DETAILS

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -511,17 +511,21 @@
     if width <= 0:  # no enough room even for ellipsis
         return ellipsis[: width + len(ellipsis)]
 
+    chars = list(u)
     if leftside:
-        uslice = lambda i: u[i:]
-        concat = lambda s: ellipsis + s
-    else:
-        uslice = lambda i: u[:-i]
-        concat = lambda s: s + ellipsis
-    for i in pycompat.xrange(1, len(u)):
-        usub = uslice(i)
-        if ucolwidth(usub) <= width:
-            return concat(usub.encode(_sysstr(encoding)))
-    return ellipsis  # no enough room for multi-column characters
+        chars.reverse()
+    width_so_far = 0
+    for i, c in enumerate(chars):
+        width_so_far += ucolwidth(c)
+        if width_so_far > width:
+            break
+    chars = chars[:i]
+    if leftside:
+        chars.reverse()
+    u = u''.join(chars).encode(_sysstr(encoding))
+    if leftside:
+        return ellipsis + u
+    return u + ellipsis
 
 
 class normcasespecs(object):



To: martinvonz, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercurial-scm.org/pipermail/mercurial-patches/attachments/20220126/8e629e01/attachment-0001.html>


More information about the Mercurial-patches mailing list