[PATCH] diff performance: re-establish linear runtime performance

Yuya Nishihara yuya at tcha.org
Thu Apr 30 14:41:26 UTC 2020


On Thu, 30 Apr 2020 16:25:09 +0200, Pierre-Yves David wrote:
> > The previous method with sum() and list() creates a new list object
> > for every hunk. Then sum() is used to flatten out this sequence of
> > lists.  The sum() function is not "lazy", but creates a new list object
> > for every "+" operation and so this code had quadratic runtime behaviour.
> > 
> > diff -r 5c1f356b108e -r fd68848f8d55 mercurial/patch.py
> > --- a/mercurial/patch.py	Fri Apr 24 12:37:43 2020 -0700
> > +++ b/mercurial/patch.py	Thu Apr 30 15:10:05 2020 +0200
> > @@ -2558,7 +2558,7 @@ def diff(
> >                   fctx2 is not None
> >               ), b'fctx2 unexpectly None in diff hunks filtering'
> >               hunks = hunksfilterfn(fctx2, hunks)
> > -        text = b''.join(sum((list(hlines) for hrange, hlines in hunks), []))
> > +        text = b''.join((b''.join(hlines) for hrange, hlines in hunks))

If the number of intermediate objects matters, creating a list of bytes might
be slightly faster.

  lines = []
  for hr, hl in hunks:
      lines.extend(hl)
  b''.join(lines)



More information about the Mercurial-devel mailing list