[PATCH] Add script to rewrite manifest to workaround lack of parent deltas

Benoit Boissinot benoit.boissinot at ens-lyon.org
Thu Aug 20 23:04:03 UTC 2009


On Thu, Aug 20, 2009 at 05:20:24PM -0400, Greg Ward wrote:
> # HG changeset patch
> # User Greg Ward <greg-hg at gerg.ca>
> # Date 1233047576 0
> # Node ID 7e0bbea3935b1044d3c5acfecae8005941dfc8ec
> # Parent  2484868cffde3893e3fafb8e515d396346b87e17
> Add script to rewrite manifest to workaround lack of parent deltas.

Thanks for cleaning it up. Some comments below.

> +
> +def good_sort(rl):

maybe toposort() ?
> +    write = sys.stdout.write
> +
> +    children = {}
> +    root = []
> +    # build children and roots
> +    write('reading %d revs ' % len(rl))
> +    #for i in revs:
> +    i = 0
> +    while i < len(rl):

You can directly iterate on the revs:

for i in rl:

> +        children[i] = []
> +        parents = [p for p in rl.parentrevs(i) if p != -1]
> +        for p in parents:
> +            assert p in children
> +        if len(parents) == 0:
> +            root.append(i)
> +        else:
> +            for p in parents:
> +                children[p].append(i)

The following is simpler:

        children[i] = []
        parents = [p for p in rl.parentrevs(i) if p != -1]
        for p in parents:
            assert p in children
            children[p].append(i)
        if len(parents) == 0:
            root.append(i)

> +    # XXX this is a reimplementation of the 'branchsort' topo sort
> +    # algorithm in hgext.convert.convcmd... would be nice not to duplicate
> +    # the algorithm
> +    write('sorting ...')
> +    visit = root
> +    ret = []
> +    while visit:
> +        i = visit.pop(0)

Maybe it's cleaner to pop from the end
        i = visit.pop()

> +        ret.append(i)
> +        if i not in children:
> +            # this only happens if some node's p1 == p2, which can happen in the
> +            # manifest in certain circumstances
> +            break

break or continue ?
> +        next = []
> +        for c in children.pop(i):
> +            parents_with_child = [p for p in rl.parentrevs(c) if p != -1 and p in children]

parents_unseen is maybe better, we don't care if they have children, but
we care if we already visited them.

> +            if len(parents_with_child) == 0:
> +                next.append(c)
> +        visit = next + visit
if you pop from the end, then you can do:
        visit += next

> +def main():
> +
> +    # unbuffer stdout for nice progress output
> +    sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
> +    write = sys.stdout.write
> +
> +    # Open the local repository
> +    ui = ui_.ui()
> +    repo = hg.repository(ui)
> +
> +    indexfn = repo.join('store/00manifest.i')
> +    datafn = indexfn[:-2] + '.d'
> +    if not os.path.exists(datafn):
> +        sys.exit('error: %s does not exist: manifest not big enough '
> +                 'to be worth shrinking' % datafn)
> +
> +    (tmpfd, tmpindexfn) = tempfile.mkstemp(
> +        dir=repo.join('store'), prefix='00manifest.', suffix='.i')

I found it a bit cleaner to split after at least one arg.
> +    tmpdatafn = tmpindexfn[:-2] + '.d'
> +    os.close(tmpfd)
> +
> +    r1 = revlog.revlog(util.opener(os.getcwd(), audit=False), indexfn)
> +    r2 = revlog.revlog(util.opener(os.getcwd(), audit=False), tmpindexfn)
> +
> +    # Don't use repo.transaction(), because then things get hairy with paths:
> +    # some need to be relative to .hg, and some need to be absolute.  Doing it
> +    # this way keeps things simple: everything is an absolute path.
> +    lock = repo.lock(wait=False)
> +    tr = transaction.transaction(
> +        sys.stderr.write, open, repo.join('store/journal'))

ditto

> +
> +    try:
> +        order = good_sort(r1)
> +        write_revs(r1, r2, order, tr)
> +        report_shrinkage(datafn, tmpdatafn)
> +        tr.close()
> +    except:
> +        # abort transaction first, so we truncate the files before deleting them
> +        tr.abort()
> +        if os.path.exists(tmpindexfn):
> +            os.unlink(tmpindexfn)
> +        if os.path.exists(tmpdatafn):
> +            os.unlink(tmpdatafn)
> +        raise
> +    finally:
> +        lock.release()

Is the non-nested except/try/finally possible with python 2.4 ?

regards,

Benoit

-- 
:wq



More information about the Mercurial-devel mailing list