[PATCH] repo.commit not finding new/deleted files

Matt Mackall mpm at selenic.com
Mon Jun 13 22:57:10 UTC 2005


On Mon, Jun 13, 2005 at 06:10:49PM -0400, Chris Mason wrote:
> On Friday 10 June 2005 17:13, Matt Mackall wrote:
> 
> >
> > I've found that the slowest bits tend to be reading/writing the
> > manifest. Doing this in an extension might be a win. But I'd recommend
> > trying out the profiler and figuring out exactly what's happening.
> >
> > Extra credit for adding a -p flag to commands.py to profile a given
> > command.
> >
> > I was earlier thinking of breaking up the manifest like git's tree
> > blobs. But experimentation with git has shown me that's exactly what I
> > don't want to do: that turns a single manifest read into 1300
> > directory reads/seeks.
> 
> So, I did profiling with deltas and compression turned off, and manifest.add
> was indeed the next biggest user of CPU.  This is because for each commit 
> we sort and then massage the list of files into manifest form.
> 
> Typically, each commit touches a small number of files and it would be faster 
> to insert items into the existing manifest data.  This brings the time to
> apply 300 patches (with deltas on) from 2m58s to 1m15s 
> (59s user time, 12s system time).    With deltas off it goes down to 48s, 
> and the load is mostly IO bound.
> 
> I can probably do better if I then wrap the whole thing in a single 
> transaction.  git needs 45s to apply the same set of patches, so we're 
> in the right ballpark here.
> 
> This patch has only been lightly tested, and probably has a nasty corner case
> left.  Also, maybe it should switch back to the old behavior when the number 
> of files in the commit is bigger than some limit (100?).
> 
> -chris
> 
> diff -u b/mercurial/hg.py b/mercurial/hg.py
> --- b/mercurial/hg.py	Fri Jun 10 15:08:31 2005
> +++ b/mercurial/hg.py	Mon Jun 13 18:05:32 2005
> @@ -6,6 +6,7 @@
>  # of the GNU General Public License, incorporated herein by reference.
>  
>  import sys, struct, os
> +from bisect import bisect

demandload, please

>  from revlog import *
>  from demandload import *
>  demandload(globals(), "re lock urllib urllib2 transaction time socket")
> @@ -128,13 +129,16 @@
>          else:
>              return mdiff.textdiff(a, b)
>  
> -    def add(self, map, flags, transaction, link, p1=None, p2=None):
> -        files = map.keys()
> -        files.sort()
> +    def add(self, map, flags, transaction, link, p1=None, p2=None, addlist=None):
> +	if addlist == None:
> +	    files = map.keys()
> +	    files.sort()
>  
> -        self.addlist = ["%s\000%s%s\n" %
> -                        (f, hex(map[f]), flags[f] and "x" or '')
> -                        for f in files]
> +	    self.addlist = ["%s\000%s%s\n" %
> +			    (f, hex(map[f]), flags[f] and "x" or '')
> +			    for f in files]
> +	else:
> +	    self.addlist = addlist
>          text = "".join(self.addlist)

Tabs are bad.

>          n = self.addrevision(text, transaction, link, p1, p2)
> @@ -465,6 +469,7 @@
>      def commit(self, files = None, text = "", user = None, date = None):
>          commit = []
>          remove = []
> +	addlist = None
>          if files:
>              for f in files:
>                  s = self.dirstate.state(f)
> @@ -489,6 +494,8 @@
>          c2 = self.changelog.read(p2)
>          m1 = self.manifest.read(c1[0])
>          mf1 = self.manifest.readflags(c1[0])
> +	if self.manifest.listcache:
> +	    addlist = self.manifest.listcache[1][0:-1]

Layering violation.

>          m2 = self.manifest.read(c2[0])
>          lock = self.lock()
>          tr = self.transaction()
> @@ -514,10 +521,29 @@
>  
>          # update manifest
>          m1.update(new)
> +	if addlist:
> +	    for f in new:
> +		bs = bisect(addlist, f)
> +		if bs < len(addlist):
> +		    (fn, n) = addlist[bs].split('\0')
> +		else:
> +		    fn = None
> +		l = "%s\000%s%s\n" % (f, hex(m1[f]), mf1[f] and "x" or '')
> +		if fn != f:
> +		    addlist.insert(bs,l)
> +		else:
> +		    addlist[bs] = l

And what's going on here? Shouldn't all these gory details be down in
manifest.add? I'd really like to keep the upper layers ignorant of
such things.

So I think this new parameter to manifest.add ought to be simply a
list of changed files, and then manifest walks the list, compares it
with the map, and does its bisect/insert/delete bits. Also, if we push
this down a bit, we can probably go directly from addlist to patch
format as a future step.

>          for f in remove: del m1[f]
> -        mn = self.manifest.add(m1, mf1, tr, linkrev, c1[0], c2[0])
> +	if addlist:
> +	    for f in remove:
> +		bs = bisect(addlist, f)
> +		(fn, n) = addlist[bs].split('\0')
> +		if fn != f:
> +		    print "failed to remove addlist for f %s [%s %s %s]" % (f, addlist[bs-1], addlist[bs], addlist[bs+1])
> +		else:
> +		    del addlist[bs]
>  
> -        # add changeset
> +        mn = self.manifest.add(m1, mf1, tr, linkrev, c1[0], c2[0], addlist)
>          new = new.keys()
>          new.sort()

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial mailing list