history-reordering extension merits

Benoit Boissinot benoit.boissinot at ens-lyon.org
Tue Jan 27 09:12:56 UTC 2009


On Mon, Jan 26, 2009 at 10:45:53PM +0100, Benoit Boissinot wrote:
> On Mon, Jan 26, 2009 at 10:14:03AM -0800, marlborobell wrote:
> > Benoit Boissinot wrote:
> > > 
> > > On Thu, Jan 22, 2009 at 07:47:45AM -0700, Bill Barry wrote:
> > >> [snip reordering]
> > >> 
> > >> Is there any purpose for doing this other than to make the graph look a 
> > >> little nicer (ie could it make the manifest smaller in a very branchy 
> > >> repo or anything like that)?
> > > 
> > > It can make revlogs smaller, if there is an interest in it I can find
> > > my version of that order (can be used for --pull, or to reorder
> > > revlogs).
> > > 
> > > 
> > 
> > I'd be interested in that. It might help shrink our great big 2GB manifest
> > while we work on more permanent solutions...
> 
> Here we go, this is an updated rewrite-log, just call it like:
> 
> python rewrite-log .hg/00manifest
> 
> (you have to omit the suffix)
> 
> You shouldn't try to reorder the changelog that way (it's possible to do it
> only via a real pull, not with rewrite log).
> 
> It isn't thoroughly tested, so please do a backup, and run verify afterwards,
> etc. I will not make any guarantee it will not break anything (I don't exactly
> remember the assumptions about monotonic linkrevs).
> 
> (the results seems quite impressive since it cut the manifest from crew in
> half).

Here is an updated version, it was working incorrectly for multi-heads
revlogs. (btw the manifest will have wrong permissions after running the script,
you can fix it manually afterwards)

diff --git a/contrib/rewrite-log b/contrib/rewrite-log
--- a/contrib/rewrite-log
+++ b/contrib/rewrite-log
@@ -1,23 +1,67 @@
 #!/usr/bin/env python
-import sys, os
+import sys, os, tempfile
 from mercurial import revlog, transaction, node, util
 
 f = sys.argv[1]
 
-r1 = revlog.revlog(util.opener(os.getcwd(), audit=False), f + ".i", f + ".d")
-r2 = revlog.revlog(util.opener(os.getcwd(), audit=False), f + ".i2", f + ".d2")
+d, b = os.path.split(f)
+
+assert (not b.startswith('00changelog')) and "do not ever ever try to reorder the changelog"
+
+fd, f2 = tempfile.mkstemp(suffix='.i', prefix=b+"-rewrite-log-", dir=d)
+os.fdopen(fd).close()
+# strip .i
+f2 = f2[:-2]
+
+r1 = revlog.revlog(util.opener(os.getcwd(), audit=False), f + ".i")
+r2 = revlog.revlog(util.opener(os.getcwd(), audit=False), f2 + ".i")
 
 tr = transaction.transaction(sys.stderr.write, open, "journal")
 
-for i in xrange(r1.count()):
+def good_sort(rl, revs):
+    children = {}
+    root = []
+
+    # build children and roots
+    for i in revs:
+        children[i] = []
+        parents = [p for p in rl.parentrevs(i) if p != -1]
+        for p in parents:
+            assert p in children
+        if len(parents) == 0:
+            root.append(i)
+        else:
+            for p in parents:
+                children[p].append(i)
+
+    visit = root
+    ret = []
+    seen = dict.fromkeys(visit)
+    while visit:
+        i = visit.pop(0)
+        ret.append(i)
+        if i not in children:
+            # no child
+            continue
+        next = []
+        for c in children.pop(i):
+            parents_with_child = [p for p in rl.parentrevs(c) if p != -1 and p in children]
+            if len(parents_with_child) == 0 and c not in seen:
+                next.append(c)
+                seen[c] = None
+        visit = next + visit
+    assert len(revs) == len(ret)
+    return ret
+
+for i in good_sort(r1, range(len(r1))):
     n = r1.node(i)
     p1, p2 = r1.parents(n)
-    l = r1.linkrev(n)
+    l = r1.linkrev(i)
     t = r1.revision(n)
     n2 = r2.addrevision(t, tr, l, p1, p2)
 tr.close()
 
 os.rename(f + ".i", f + ".i.old")
 os.rename(f + ".d", f + ".d.old")
-os.rename(f + ".i2", f + ".i")
-os.rename(f + ".d2", f + ".d")
+os.rename(f2 + ".i", f + ".i")
+os.rename(f2 + ".d", f + ".d")


-- 
:wq



More information about the Mercurial mailing list