Transitioning from Clearcase to Mercurial (A developer's account)

Chris Scott Chris.Scott at mmodal.com
Wed Oct 5 17:45:30 UTC 2011


Some time ago I posted to the list about several small parts about our experience transitioning from Clearcase to Mercurial.  I promised a longer follow-up and unfortunately it's taken me too long to actually get back around to it.  Here I'd like to document our experience and try to outline exactly why we think our development process is so much better than what it used to be.  This was written using reStructuredText because I'll probably publish it somewhere eventually (right now I don't have a good spot to do so).  I considered posting the html version but it seems that most people here prefer monospaced (I've instead attached the html version).

I apologize in advance for the wall of text.

Regards,

~Chris

=========================================
Transitioning from Clearcase to Mercurial
=========================================

---------------------
A Developer's Account
---------------------

Abstract
========

We moved away from Clearcase because it was broken and picked Mercurial over Subversion.  Not all of the improvements we made, however, were directly related to a VCS.  I don't pretend to think that this is "The right way" for everybody, but rather "Here's what worked for us."

Background
==========

Ancient History
---------------

I'll be the first to admit that I am not a Clearcase expert.  I regard people who say that they are with the same apprehension as C++ "Experts".  I have, however, worked at two very different organizations were it was being used.  The first was an old-school engineering firm.  Old-school as in my aisle mate still had his first PC that he was assigned around 1980 next to his newer workstation.  We were a very conservative organization who targeted a variety of Unix, namely HP-UX and SunOS.  We had a large multi-site Clearcase installation, along with several (!) full-time Clearcase administrators.  We worked off of dynamic views and the user experience was abysmal.  Every other week we'd have minor network issues that would bring Clearcase to its knees.

For those who are not familiar, dynamic views are a virtual directory that is kept in sync with the central repository via an file-system level driver.  It worked well for well-oiled VOBs (or versioned object base, the Clearcase concept of a repository) on small low traffic networks, but on our *very* large internal network, the performance suffered miserably.

When the network performance wasn't up to snuff, there wasn't anything that you could do in that mounted VOB.  You couldn't even read files.  Operations took forever to complete and I never once rolled back my dynamic view to a previous state because our config-spec was too complex.  The config-spec is a special file that describes what view you will be presented.  In convoluted syntax, it basically said "show me this version, from this base date, unless this version is present on this particular branch".  If you wanted to roll-back to a previous version of the VOB you would have to check-out (lock) files on a file-by-file basis to the particular version you wanted, or modify the config spec and wait hours (!) for it to apply the new rules to our huge, huge VOBs.

This was, however, only 2001 and Clearcase was still considered "best-of-breed" (which is market speak for "this isn't innovated but worked 5 years ago"), Subversion was new and envied by our developers and was trialled on small projects, and Linus was proselytizing about the benefits of BitKeeper and would soon switch over Linux development to the first widely used DVCS.

It was, in short, the dark ages and it would be an unfair comparison to compare 2001 Clearcase with 2010 Merurial.  But that was the Clearcase lineage and it didn't change much over the next several years.

The Middle Ages
---------------

After I left that engineering firm, I went to go work for a corporation where we used StarTeam.  For those familiar with StarTeam it may surprise you that it was actually much much better to actually work with than a huge Clearcase system.

We had no StarTeam admin, but only a small development staff (<20).  I still never rolled back to a previous version with StarTeam, performed no merges (!!), and learned the process for un-locking checked-out files for developers who never checked anything in and would go on vacation with half a project locked.  It was marginally better than Source Safe.  When I left, they were transitioning to TFS.  It wasn't any better.

The Renaissance
---------------

I left that corporation to work for a small technology firm in 2009, which is where I work today.  This firm had a history of association with large corporations and along the way we acquired a Clearcase installation.

Here, we worked mainly in static-views (think CVS/SVN check-outs) in our own personal "developer" branches.  We would merge back down to a shared "integration" branch in a dynamic view.  It was better than anything that I had used before.  I say that because Clearcase workflows varied from site to site to such a high degree that some installations felt like a completely different system.  Here, co-development actually worked.  The pace of development was feverish and thus there were many merges.  Merges and updates, however, were very painful.  Like, 30 minute updates and full-day merges.  Predictably, people would develop in their own branches most of the time and there would be coordinated "integration" merges -- where everybody would merge their changes in at the same time -- which were particularly horrific.  Just finding "merge candidates" could take 20 minutes, even if you had only changed one file!  The automatic merge algorithms were very good, but the merge process was a big-black box and it was hard to review, in whole, what you actually changed and if you performed the merge correctly.

We has a *de facto* release engineer who did most of the Clearcase administration, and two or three other gray-beards who could tell you how to un-screw-up VOBs that bluescreened in the middle of a merge. (You had to turn off virus-scanning.  Seriously.)  Our VOBs were, like all other Clearcases I've ever seen, absolutely huge.  One static check-out would take, on average, 5 GB and about an hour to update.  Thus, it took about a week for a new developer to bring their new machine up to a working state.  Nobody trusted dynamic views for actual development.  We only merged there.

That is, when you could get a license.  We had floating licenses which would lock a license for about 30 minutes while you were doing VCS operations.  Since we were growing fantastically, it was a common experience to have Clearcase proclaim that there weren't any licenses available while wanting to check-out something.  You'd then have to wade through the list that told you who hadn't done anything for a while and kick them off.  Fun.  I can't blame Rational for that however.  It was a commercial product.


I won't even go much into how long it took to actually cut a release.  Or a patch. Or simply applying a label (tag).

The process was still broken.  A revolution was coming.

Change(set)s, they are a-coming
===============================

After our CTO blue-screened during a mega-merge for the last time, it was decided that we would move to another VCS.  *Any* other VCS.  We considered Subversion first as it had much more of a mind-share and tool-support as at the time as Mercurial had just hit 1.0.  We talked to consultants on how to make the transition.  We tried to figure out how the workflow would change.

I had already been using Mercurial over my static Clearcase instance because I could painlessly roll-back, review incoming changes and figure out exactly what I was working on.  Yes, I was using a VCS on top of an existing VCS.  It actually worked pretty well.  I quickly became an advocate for transitioning to Mercurial over SVN.

For me there were three big advantages over SVN:

1. A similar workflow.

   As I said before we worked in personal branches in CC and would merge back to mainlines when changes were done.  Along the way we could check-in several times as checkpoints and aggregate fixes.  SVN didn't have this model.  Checked-in changes would've just gone back into the mainline.  This would have lead to developers working on massive changes without ever checking-in anything.  HG allowed us to work more like the way that we already worked.  We could share work without having to possibly pollute the mainline.

2. Changesets.

   This is an undeniably huge paradigm shift.  The ability to review one's progress on a change not only on one file but all files in a project, as well as reviewing other people's changes can not be understated.  I'm not sure I can adequately describe here why this is so big.  Linus did it better at his Git Google talk.  It's a recommended watch (even though he disparages everything not Git.)

3. Painless merges and quick, local operations.

   This was almost as huge as changesets.  Things didn't take forever to complete anymore.  In fact they were relatively instant.  One could work on highly disruptive features in isolation, merging in incoming changes several times a day and have fine grained control over what and when everything went back into the mainline.  Releases took 30 minutes.  Patches took 30 minutes.

For me there were many other advantages but these were the big three.  We would've thrown Git into the ring for consideration, but we quickly threw it out because of the similarity to HG and the fact that Windows support for HG was so much better at the time.  I like Git and think we would've been as successful if we had moved there, but it didn't have the same fit that HG did.

The Subversion consultants that we talked to turned out to have little in the way of turn-key Clearcase-to-Svn solutions.  Even though SVN had better tool support, that was about the only thing that it had over HG.  Besides, there was TortoiseHg, which was good enough and under active development under sane leadership.  These days, I consider Thg 2.0 one of the best Windows VCS GUI solutions.  It just works.

One particularly notable success-story came from when it came time to set up replication of a project to a partner of ours overseas.  They had adopted CC to get source updates from us.  Multi-site replication was one of the big selling points of Clearcase.  Because of Hg's distributed nature, we set up a faster, better replacement in a week.  I haven't touched it since I finished the configuration.

Here is the benefit list, verbatim, that I documented for internally during out transition:

* hg is free. No more problems with licenses
* hg is fast. The whole repository is distributed when a project is cloned. This means actions like reverting to previous versions and searching for text across all versions do not need network access and execute fast because all resources are local
* Branching is cheap and encourage experimentation. Local clones make branching fast and easy enough to do every day.
* Merging is still powerful. Except that finding "merge candidates" happens much quicker.
* Easy setup and maintenance. No more config specs.
* hg is repository centric, not file centric. Changesets are first class objects. No longer do you have to worry about pulling partial changes unless you really want to.

Practical migration considerations
==================================

Now that we had decided on Mercurial, we had to get down to how we would actually migrate projects.  The first thing we decided may shock some of those considering a similar move:

 *We didn't attempt to convert all history.*

It turns out having information about ancient changes (our code base goes back to 1998) wasn't that big of a deal in day-to-day work.  We decided that we would take snapshots of all the then-supported versions, and didn't worry about other branches.  We could always bring up the Clearcase version tree if we needed to do research.  Turns out nobody did.  When we virtualized the Clearcase server a year later nobody noticed.  In fact, along the way Clearcase access broke but nobody complained.

Secondly, we when through a massive re-organization effort.  We weren't limited to a set of hard-to-maintain mega VOBs anymore.  We could move projects into their own repos.  We could do it a project at a time.

We started with huge HG projects that were almost the same as the VOBs they came from, but we hg convert'ed out projects from them into top-level projects all their own.  It was easy.

The second shock decision we made was in regard to all of the large binary assets we had in the old VOBs:

 *We didn't attempt to control non-source objects.*

Turns out that almost all of the large binaries in the old VOBs were 3rd party dependencies.  The few were left were pretty easy to take care of.  We didn't use any large-file plugins for HG.  We did, however, use another system entirely and it's use was almost as big a change as moving to Mercurial itself.

A modern software engineering workflow
======================================

We adopted the mantra of "If it doesn't diff, it doesn't belong in HG".  There are some exceptions, of course, like icons and small images; nothing that HG had much trouble with.

For everything else, we implemented an Artifact management system:  Apache Ivy and Artifactory.

Basically, we moved all 3rd party dependencies into Artifactory were ivy could bring them down into a project on-demand.  Resolving and retrieving took a non-trivial amount of time, yes, but our repos were small and fast to clone.  We had fine-grained control over which versions that projects used.

This system lead us to the last huge benefit as we moved from Clearcase:

 *Projects get shared in binary-form to downstream projects, not in source-form*

In short, all derived objects get published to Artifactory, where other projects can use them directly instead of having to compile them separately.  Yes, I'm familiar with the concept of winking in derived objects in Clearcase, but I've never seen a real-world system where it worked.

So now when it came time to make a patch, one would only have to make the change to the one place where it was needed and all other dependencies could be verifiable loaded (which only really works if you control your ivy files correctly).  It's not a perfect system as it allows some degree of human error in messing up ivy descriptors, but it's so much more time-efficient and verifiable (i.e. we can now answer questions like, "What are the licenses of all of the dependencies of project X"?) To illustrate, I work on a project "D" that has the following dependency chain:

::

        E-------->|
        A->B----->D
           |->C-->|

That is to say, my project D depends on E, B and C.  B depends on A, C depends on B.  In Clearcase, I would have to recompile A, B, C, and E to make a patch to D.  I would have explicitly control the version of B that both D and C needed.  Now, those versions, in binary form, are controlled so I only have to update D to the tag I want to base from, make my patch and republish to Artifactory. It "Just works"

During development we utilize the last main benefit that I'm going to go over:

 *All dependencies are built automatically on a built server using continuous integration*

Every push for every project kicks off a build in Jenkins.  Thus I can ask for a change in A, and have it be available -- automatically -- in about an hour for consumption by D.  I always have the ability to compile locally if I'm developing A and D at the same time (utilizing local publishing), but the base case is easy.

Lastly, this prevents the temptation of distributing builds built on a developer machine.  As a result, the "It works on my machine" excuse is almost completely invalidated.

A breath of fresh air
=====================

I can't tell you how absolutely grin inducing the current system is when developers do things in a few minutes that used to take hours or days.  Merges are unexciting, people don't worry about breaking things and work continues unabated.  Neophytes, who were not around during "The Old Days" have a relatively short learning curves, can contribute to a project on a clean-machine in a couple hours and rarely give much thought to VCS issues.  Unexciting, boring was exactly what we were going for.  In fact that's the best compliment that I could ever use for a VCS.  It works the way we want it to, without much fuss.

One last important note: we didn't expect that Mercurial would do everything for us.  To badly paraphrase a famous phrase:

 *Render unto VCS things that are source code, and unto others things that are not.*

For us, Mercurial takes care of sources, Ivy and Artifactory take care of binaries and dependencies.  They are loosely coupled, but work in conjunction.  For me the holy quartet of software engineering is:

* VCS
* Build system
* Dependency management
* Change management

All of these things should be loosely-coupled so you can upgrade any part when it makes sense.  They should be easy to integrate with one another.  Beware any system that attempts to do all of these things.

Conclusion
==========

I won't say that the transition didn't take a lot of time (~3 months) or a lot of research and effort, but it was a pretty easy change to make in the end, all things considered.  We now have a modern, efficient software engineering system.  It's not bleeding edge like super-agile places like github_.

.. _github: http://zachholman.com/talk/how-github-uses-github-to-build-github

It is, however, far better than everywhere I've ever worked, and better than most I've ever heard about.

Given our experience, I find that there's little excuse for organizations to *not* transition to an open-source DVCS.  Mercurial in particular works well for us.


The technology is there.  It's ready.  The benefits are tangible.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20111005/724e58ba/attachment-0002.html>


More information about the Mercurial mailing list