Migrating from Clearcase to Mercurial

Simon King simon at simonking.org.uk
Sat Dec 17 18:31:20 UTC 2011


On Fri, Dec 16, 2011 at 1:23 PM, Martin Geisler <mg at aragost.com> wrote:
> Simon King <simon at simonking.org.uk> writes:
>
> Hi Simon,
>
>> For the branching behaviour, I am heeding the warning on
>> http://mercurial.selenic.com/wiki/StandardBranching and not using
>> named branches for every change.
>
> That's good -- while we also have caches for these, many tools will
> assume that they can present all branches in a single drop-down menu.
>
>> Instead, I am going to create a new server-side clone whenever a
>> developer wants to start a new piece of work. He will push his changes
>> to that clone where they can be reviewed. Once the review is complete
>> (and as long as the clone is fully merged up with the main
>> repository), the server will log the outgoing changesets, then push
>> them from the branch repo to the main repo. The branch repo will
>> probably then be deleted to save space on the server.
>
> Depending on how busy the main repo is, the feature branch wont stay
> fully merged up for long. This means that the server cannot put it to
> the main repo. Therefore, it's normally done the opposite way: somebody
> takes care of integrating branches that are reviewed. This person will
> then do the final (tiny) merge before pushing to the main repo.

We already had this problem with Clearcase. Our web interface
prevented people from merging if any of the files they had worked on
had moved on. Mercurial is stricter (and rightly so) by versioning the
whole repository and therefore requiring a merge if *any* file has
changed, but I don't think we'll have any trouble. For our busier
projects we tend to operate an informal "merge queue". So I think we
should be OK with pushing from branches to main.

(Of course, switching between "each developer is required to merge his
branch up" to "a single integrator is responsible for merging
branches" is a simple policy decision rather than a technical one, so
we can change our minds here, or even operate different policies for
different projects)

>
>> I'm less confident about how to deal with the tagging. From a
>> technical point of view, it's not nearly as important to tag every
>> merged branch, because the changeset ID is a perfectly good unique
>> identifier. But socially, I don't think we can do without those
>> incrementing build IDs; people are too used to referring to builds by
>> their number, and understanding that build A is more recent than build
>> B simply because it has a higher number. (We could store build IDs
>> outside of mercurial, but then developers can't use them with commands
>> like 'hg merge' and 'hg update')
>
> They could use the build IDs if an extension would "inject" them into
> the lookup chain. Extensions like mq already add extra, temporary,
> identifiers to existing changesets.
>
> The extension would query a database for the build IDs. This could be
> convenient if you need to attach more information to the ID and you
> already maintain that information in another system.
>

I did wonder about this. I think the decision will end up depending on
whether we want the tags to be available offline (independence from
the network being one of the benefits of Mercurial in the first
place). I suppose we could also cache the database locally for offline
operation, although that starts to sound like reinventing .hgtags :-)

>> Firstly, are we going to start seeing performance problems if we have
>> a few thousand tags in a repository? If so, are the performance
>> problems only caused by having thousands of tags at a head? As I
>> understand it, Mercurial examines the .hgtags file for each head in
>> the repo, so if we purge tags that are no longer interesting from
>> every head, will the performance be the same as if they had never
>> existed?
>
> We have a cache in place for the .hgtags files. This means that we
> normally don't have to consult all the heads.
>
> Also, the .hgtags file is only read from topological heads, not branch
> heads. The number of topological heads is normally quite small compared
> to the number of named branches in total.
>

Yes, I don't imagine we'll have many heads at all. If trimming the
number of tags does become necessary, doing it on all heads would be
trivial.

>> Secondly, we will actually be creating these tags through our web
>> interface, which means it'll be the server running "hg tag". I think
>> this means that I need a working copy on the server. I could keep it
>> updated to default/tip on every push, but this seems a little wasteful
>> of disk space, requires an extra "update" step on each push, and so
>> on. I was wondering if instead I could have a named branch called
>> "tags" which exists solely for tagging.
>
> You could do that, or you could write a custom extension for this. It's
> quite easy to add a changeset from an extension. An example is this
> extension I wrote to reply history with better rename information:
>
>  https://bitbucket.org/aragost/fixrenames/src/ff8429d9bf4f/fixrenames.py#cl-142
>

Thanks for the pointer - that together with the docstring for memctx
should get me most of the way there.

However, I think I have another (minor) issue with tagging commits
being on the main line. In Clearcase, when we are working on a branch
and another developer merges and labels his work, we merge that label
out to our development branch. If we did exactly that in Mercurial, we
would actually be leaving the tagging commit as an unmerged head. It's
not too bad, because when you merge up you generally want to merge the
tip anyway, so our guidelines will be 'use plain "hg merge" unless you
explicitly want an older version'.

I suppose my question is whether there is actually a downside to
having a named branch just for tags? I can understand that it would be
a pain if developers were creating tags in their own repos, but if the
server is responsible for creating tags, is there anything wrong with
a single named branch?

(We could also write an extension that hid commits on that branch from 'hg log')

If we decide not to use tags for every branch, this becomes a moot point.

>> So, apologies for the rambling email, but I wanted to give some
>> background about why I'm doing things this way. I'm really looking for
>> feedback on the tagging questions; will we have performance problems
>> with thousands of tags, and is there anything wrong with having a
>> named branch just for .hgtags?
>
> Greg Ward wrote the tag caching logic because he ended up with ~108,000
> tags after a CVS conversion and 'hg tags' took 7 seconds to run:
>
>  http://markmail.org/message/ngu4wzp25mgxryy3
>
> I did not find a mail or commit message where he gives the times after
> the patches. Another user reports in Issue548 that 'hg parents' went
> from 4 sec to 0.3 sec. Based on that, it seems that we can handle lots
> of tags now.
>
> If you want to know this for sure, then you could make a script that
> generates a repository with, say, 50,000 tags.
>

I actually started to do that - I took the main mercurial repository
and began duplicating the existing tags until there were a few
thousand. But I wasn't sure what I should actually try afterwards. "hg
log" was slower, but that could have been due to the sheer number of
commits. I need to run the experiment again and be more rigorous with
the testing.

> --
> Martin Geisler
>
> aragost Trifork
> Professional Mercurial support
> http://mercurial.aragost.com/kick-start/

Thanks again for all your suggestions,

Simon



More information about the Mercurial mailing list