[PATCH] repo.commit not finding new/deleted files
Matt Mackall
mpm at selenic.com
Mon Jun 13 22:57:10 UTC 2005
On Mon, Jun 13, 2005 at 06:10:49PM -0400, Chris Mason wrote:
> On Friday 10 June 2005 17:13, Matt Mackall wrote:
>
> >
> > I've found that the slowest bits tend to be reading/writing the
> > manifest. Doing this in an extension might be a win. But I'd recommend
> > trying out the profiler and figuring out exactly what's happening.
> >
> > Extra credit for adding a -p flag to commands.py to profile a given
> > command.
> >
> > I was earlier thinking of breaking up the manifest like git's tree
> > blobs. But experimentation with git has shown me that's exactly what I
> > don't want to do: that turns a single manifest read into 1300
> > directory reads/seeks.
>
> So, I did profiling with deltas and compression turned off, and manifest.add
> was indeed the next biggest user of CPU. This is because for each commit
> we sort and then massage the list of files into manifest form.
>
> Typically, each commit touches a small number of files and it would be faster
> to insert items into the existing manifest data. This brings the time to
> apply 300 patches (with deltas on) from 2m58s to 1m15s
> (59s user time, 12s system time). With deltas off it goes down to 48s,
> and the load is mostly IO bound.
>
> I can probably do better if I then wrap the whole thing in a single
> transaction. git needs 45s to apply the same set of patches, so we're
> in the right ballpark here.
>
> This patch has only been lightly tested, and probably has a nasty corner case
> left. Also, maybe it should switch back to the old behavior when the number
> of files in the commit is bigger than some limit (100?).
>
> -chris
>
> diff -u b/mercurial/hg.py b/mercurial/hg.py
> --- b/mercurial/hg.py Fri Jun 10 15:08:31 2005
> +++ b/mercurial/hg.py Mon Jun 13 18:05:32 2005
> @@ -6,6 +6,7 @@
> # of the GNU General Public License, incorporated herein by reference.
>
> import sys, struct, os
> +from bisect import bisect
demandload, please
> from revlog import *
> from demandload import *
> demandload(globals(), "re lock urllib urllib2 transaction time socket")
> @@ -128,13 +129,16 @@
> else:
> return mdiff.textdiff(a, b)
>
> - def add(self, map, flags, transaction, link, p1=None, p2=None):
> - files = map.keys()
> - files.sort()
> + def add(self, map, flags, transaction, link, p1=None, p2=None, addlist=None):
> + if addlist == None:
> + files = map.keys()
> + files.sort()
>
> - self.addlist = ["%s\000%s%s\n" %
> - (f, hex(map[f]), flags[f] and "x" or '')
> - for f in files]
> + self.addlist = ["%s\000%s%s\n" %
> + (f, hex(map[f]), flags[f] and "x" or '')
> + for f in files]
> + else:
> + self.addlist = addlist
> text = "".join(self.addlist)
Tabs are bad.
> n = self.addrevision(text, transaction, link, p1, p2)
> @@ -465,6 +469,7 @@
> def commit(self, files = None, text = "", user = None, date = None):
> commit = []
> remove = []
> + addlist = None
> if files:
> for f in files:
> s = self.dirstate.state(f)
> @@ -489,6 +494,8 @@
> c2 = self.changelog.read(p2)
> m1 = self.manifest.read(c1[0])
> mf1 = self.manifest.readflags(c1[0])
> + if self.manifest.listcache:
> + addlist = self.manifest.listcache[1][0:-1]
Layering violation.
> m2 = self.manifest.read(c2[0])
> lock = self.lock()
> tr = self.transaction()
> @@ -514,10 +521,29 @@
>
> # update manifest
> m1.update(new)
> + if addlist:
> + for f in new:
> + bs = bisect(addlist, f)
> + if bs < len(addlist):
> + (fn, n) = addlist[bs].split('\0')
> + else:
> + fn = None
> + l = "%s\000%s%s\n" % (f, hex(m1[f]), mf1[f] and "x" or '')
> + if fn != f:
> + addlist.insert(bs,l)
> + else:
> + addlist[bs] = l
And what's going on here? Shouldn't all these gory details be down in
manifest.add? I'd really like to keep the upper layers ignorant of
such things.
So I think this new parameter to manifest.add ought to be simply a
list of changed files, and then manifest walks the list, compares it
with the map, and does its bisect/insert/delete bits. Also, if we push
this down a bit, we can probably go directly from addlist to patch
format as a future step.
> for f in remove: del m1[f]
> - mn = self.manifest.add(m1, mf1, tr, linkrev, c1[0], c2[0])
> + if addlist:
> + for f in remove:
> + bs = bisect(addlist, f)
> + (fn, n) = addlist[bs].split('\0')
> + if fn != f:
> + print "failed to remove addlist for f %s [%s %s %s]" % (f, addlist[bs-1], addlist[bs], addlist[bs+1])
> + else:
> + del addlist[bs]
>
> - # add changeset
> + mn = self.manifest.add(m1, mf1, tr, linkrev, c1[0], c2[0], addlist)
> new = new.keys()
> new.sort()
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list