Moving from ClearCase to Mercurial -- practical implications

Dean Bell ddeanbell at gmail.com
Fri Feb 11 07:34:04 UTC 2011


Hello,

in my company, we're working on a conversion from ClearCase to Mercurial.
We're noticing very good improvements in speed (Mercurial commands versus
ClearCase commands, but also serious speed improvements for builds, since
these are done locally using Mercurial), but we still have a number of
issues that currently prevent us from making the switch.

We think these are mostly related to creating a good repository setup.
The setup we have currently, is as follows:
- One 'source repository'. This is very large (a first version was around 6
GB). We've removed as much binaries as possible, and have gotten it down to
around 2-3 GB, which still seems excessive. Additionally, we have
third-party sources in this repository. The problem with these is that some
of the headers of third-party sources are up to 15 MB in size. We've also
seen straight conversions from binaries to header files in some locations
(basically a header file consisting of a large array {123, 67, ... }).
- One 'tools repository'. This is also a very large repository (around 5
GB). It contains binaries for every tool we need (including compilers,
scripting languages (Python, Perl...), and additionally all the sources for
these tools. However, we don't expect people to update their tools
repository every day (or it could happen using a cronjob overnight), so
perhaps this is less of an issue.

The issues we are experiencing:
- As said, the source repository is very large. It takes quite a while to
make a clone. We're not entirely sure how to add 'semi-header-files' that
are very large. They're still source files, so it doesn't seem like a good
idea to handle them using one of the extensions used for binaries (like
BigFiles).
- Commands are not always 'snappy'. Doing a 'hg status' can take some time
(maybe 10-30 seconds on the large workstations, but this can go up to 2-3
minutes on Virtualbox on a local PC). Is this normal? Is there a decent way
to improve it?
I looked at the inotify-extension, but it's still experimental. Also, it
appears we need about 18000 inotify watches per source repository, and there
will be multiple developers per machine, each with multiple repositories.
- Synchronization can take a while. For example, when doing a 'hg pull', the
network transmission is fast, but it can take some additional time (can be
even a few minutes) to actually update all the files on disk.

Possible concerns are also:
- What if the repositories grow too fast? The tools repository seems like
something that will grow very fast, due to new binaries being checked in.
But what about the source repository? We have a few hundred developers
working on our sources, and a number of people are worried the size of the
source repository will go up very fast once they all start committing to the
source repository.
- What if incorrect commits are done? Perhaps source commits are not very
large, but what if a developer commits a 100 MB file?

Greetings,
Dean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20110211/0221ebc9/attachment-0001.html>


More information about the Mercurial mailing list