[PATCH 4 of 4] dirstate: avoid use of zip on big lists
Matt Mackall
mpm at selenic.com
Sat Dec 1 00:07:33 UTC 2012
On Fri, 2012-11-30 at 14:20 -0800, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1354313955 28800
> # Node ID ab0ec24445a5402cfc3322ac515c1ab3368b833c
> # Parent 59ca9fefdb7d956cb76d04f3acc420289736957e
> dirstate: avoid use of zip on big lists
>
> In a clean working directory containing 170,000 tracked files, this
> improves performance of "hg --time diff" from 1.69 seconds to 1.43.
I'd rather see the results of perfstatus.
> This idea is due to Siddharth Agarwal.
>
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -696,8 +696,9 @@
> # step 3: report unseen items in the dmap hash
> if not skipstep3 and not exact:
> visit = sorted([f for f in dmap if f not in results and matchfn(f)])
> - for nf, st in zip(visit, util.statfiles([join(i) for i in visit])):
> - results[nf] = st
> + nf = iter(visit).next
> + for st in util.statfiles([join(i) for i in visit]):
> + results[nf()] = st
That's not pretty, is it? The parallel iteration over visit is quite
confusing. Slightly better as:
nextstat = util.statfiles([join(i) for i in visit]).next
for nf in visit:
results[nf] = nextstat()
Further, if we're motivated by space overhead, the list passed to
statfiles could be a generator. That tends to be a loss on non-gigantic
lists though.
As this is apparently the only user of statfiles, perhaps a better API
is possible.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list