Thoughts on Mercurial and Git

Theodore Tso tytso at mit.edu
Tue Mar 27 16:24:14 UTC 2007


On Tue, Mar 27, 2007 at 09:08:33AM -0500, John Goerzen wrote:
> His main complaints against Mercurial seem to be:

I wouldn't call it complaints, really; rather that git has some
nice-to-have things which that aren't in hg today, and which I happen
to think are important.

For example, someone asked how hard it would be for me to implement
"bk collapse", and it's somthing that I could probably implement in
15-30 minutes.  Part of this is because git was designed to be more
easily extensible; part of this is because I'm not as up to speed with
Python (which is not hg's fault); part of it is because hg's data
store makes hg strip only work with significant limitations (if you've
since pulled some other internal branches, and the changesets you want
to collapse are no longer at the very end of the file so that
truncation trick will work, hg strip won't work).   

I like having that kind of ability to easily extend the tool; the fact
that I could create the first working version of "git mergetool" in
about an 30 minutes, and spent maybe a few hours polishing it before
it got accepted in git's mainline, is really cool.

But is that a "complaint" against mercurial?  I wouldn't call it that;
most programmers might not feel as much of a desire to be able to hook
into a DSCM and extended as I have.  And if they have all of the
functionality they need, hey, go wild.

I'd much rather see a good, high-quality bi-directional gateway
mechanism so that people can use whatever SCM tool they feel most
comfortable with.

So with that in mind (and because I don't like much discussions such
as 'which is better, ext3 or reiser3'), let me quote from a recent
message I had written to the git list:

------

At the highest architectural viewpoint, there are three levels of
difficulty of SCM conversions:

A) One-way conversion utilities.  Examples of this would be the
        hg2git, hg-fast-import scripts that convert from hg to git,
        and the convert-repo script which will convert from git to hg.

B) Single point bidrectional conversion.  At this level, the hg/git
        gateway will run on a single machine, and with a state file,
        can recognize new git changesets, and create a functionally
        equivalent hg changeset and commit it to the hg repository,
        and can also recognize new hg changeset, and create a
        functionaly equivalent git changeset, and commit it to the git
        repository.

C) Multisite bidirectional conversion.  At this level, multiple users
        be gatewaying between the two DSCM systems, and as long as
        they are using the same configuration parameters, if user A
        converts changeset from hg to git, and that changeset is
        passed along via git to user B, who then running the
        birectional gateway program, converts it back from git to hg,
        the hg changeset is identical so that hg recognizes is the
        same changeset when it gets propgated back to user A.

(C) would be the ideal, given the distributed nature of hg and git.
It is also the most difficult, because it means that we need to be
able to do a lossless, one-to-one conversion of a Changeset.  It is
also somewhat at odds with doing author mapping and signed-off-by
parsing, since that could prevent a reversible transformation.
However, what may very well be common for projects is for them to
start with (B), and to convert over some of the historical changesets,
and then later on allow multiple users to clone from the two git/hg
repositories and then do the multisite conversion.

So what that also means is that even if we only do (B) at first, it
might be useful if we have some of the characteristics needed to
eventually get to (C), even if we can't get there right away.

So more practially, here are some of the things that we would need to
do, looking at hg-fast-export:

*) Change the index/marks file to map between hg SHA hash ID's instead
of the small integer ordinals.  This is useful for enabling multisite
conversion, but it is also useful for tracking tag changes in .hgtags.

*) Have a mode so that instead of only checking changes greater than
last run, to simply iterate over all changesets in mercurial and check
to see if hg SHA1 commit ID is already in the marks file; if so, skip
it.

*) Have a mode where the COMMITER id is "hg-fast-export" and the
COMMITER_DATE is the same as the AUTHOR_DATE (so that the changelog
converesion is the same no matter where or who does the converation).
This is mainly to enable multisite converstaion.

                                                - Ted



More information about the Mercurial mailing list