Mercurial in a corporate environment (migration from CVS)
Martin Geisler
mg at aragost.com
Wed Jan 11 07:42:50 UTC 2012
Stephen Morton <stephen.c.morton at gmail.com> writes:
> I have not found, in all my googling, something like "An
> administrator's guide to implementing Mercurial in a large centralized
> corporate environment." So I've got some questions about how to do it.
>
> Ok, don't flame me. I know I've said a bunch of inflammatory things
> just there. I know about DVCS. Hear me out.
Heh, you're doing just fine and have obviously read up on Mercurial. So
you wont be flamed :-)
I think the reason you haven't found such a all-in-one guide is that
each deployment is a little different from the other.
> I'm an engineer working in the small tools team of a large-ish group.
> We have some 200+ designers and 200+ testers and are using CVS and I'd
> like to move us to Mercurial, and I have the power and support to do
> this. We have an offshoot project that would be a perfect project to
> incrementally prototype our final solution on before migrating the
> rest of the group, and 10 years of CVS history, to Hg in the future.
>
> This is a big project. The CVS checkout of a branch is about 900 MB.
> The complete repo with all history in CVC is about 20 GB. We
> experimented with an import to Git a couple of years ago and the git
> repo was about 15 GB.
>
> I don't think I even know enough to ask the right questions to ask
> without being flamed. So my first question is "Are there some HOWTOS
> or something that add up to an administrator's guide to implementing
> Mercurial in a large centralized corporate environment?" If not, I've
> got some questions, below.
>
> First, here is what our environment will look like. I know, you're
> going to tell me that once we switch to Hg, we will no longer want to
> be centralized etc etc.
Let me just confirm what the others have said: you'll still want a
centralized repository. All companies I've worked with have such a
repository and there's nothing wrong with that. Exactly who can push
into this repository is another question.
> And I'll just reply that they only possible way for this to get off
> the ground is for us to start with a workflow that, from some level at
> least, looks very much like CVS.
>
> - We would maintain a centralized repo model where all developers
> can push to. That is not negotiable. People could push and pull
> from each other for development, but it would not be official until
> pushed to the central repo.
That is a quite normal setup.
> - We have designers working on Linux and on Windows. This again is
> non-negotiable.
TortoiseHg works on both platforms.
> - I assume we'd move the entire CVS history to Hg. We could perhaps
> migrate it all but have a sub-repo without the extensive history
> that people normally pull from (to make designer pulls less
> bandwidth-heavy) - Designers will always need workspaces from at
> least 4 different branches on their machines. (Gotta support the
> customers and fix any bugs!)
Migration of history is a tricky point. Some companies spend a lot of
effort in converting their CVS history to Mercurial and then realize
they don't need it. It all depends on your particular situation and work
flows:
Do you have several releases that you need to support in parallel at
different customers? You'll at least want to have the version running at
each customer checked in as separate commits.
Have you been editing the CVS history in "weird" ways that the
conversion tools cannot handle? Or will cvs2hg
(http://vc.gerg.ca/hg/cvs2svn/) happily convert your repository into
something that you can recognize in Mercurial? If the tools work, then
conversion can be easy.
Otherwise, I've seen people be very happy with a point-wise migration
where you repeat
$ rm -r *
$ cvs export -r RELEASE_1_0
$ hg addremove
$ hg commit -m "Release 1.0"
$ hg tag 1.0
for each tag in CVS. That gives you a small set of changesets in
Mercurial, but changesets that reflect the important points in history
for your project.
> My uninformed questions are:
>
> - You cannot push if your *repository *is stale, which is different
> from CVS where you just can't commit a *file *which is stale. How
> do other large organizations find that this works for them where
> people are pushing literally every minute?
People work in a hierarchy and in different repos on the server. If my
team is implementing Feature-X, then we'll collaborate with each other
using a Feature-X repo on the server. That way, each team work on their
own team repository. Team repositories are then later merged into the
central repository by a few integrators.
> - How do you make hooks or something else to disallow certain
> advanced repo management commands (e.g. tagging, branching on the
> central repo, "hg push -f" and creating two heads, revert,
> rollback) actually on the server as opposed to just saying "you
> shouldn't do that"? (People can do what they want on their own
> repo, but once they push, the history is immutable. And advanced
> maneuvers on the server should be only for the "inner circle" of
> the most senior developers and the tools team.)
That's all configured at the server-side. I wrote a hook for the
University of Zurich that controls tagging -- I can try digging it up if
you think it will be interesting.
> - Major release branches: cloned repos or named branches? (We'd do
> a lot of grafting to backport fixes to customer release branches.
> Does that make a difference? Would it be any more difficult with
> one method or another?) I know the obvious answer is that
> "Mercurial uses clones for branches" but in the CVS environment
> where branches are so a part of life, it seems so crazy to do it
> another way, and it seems like such a step backwards too.
Well, Mercurial *used* to use clones for branching. This is what the
Definitive Guide recommends, but named branches have come a long way
since then and we now also have bookmarks.
An important point to understand is that there's little difference
between multiple clones with one branch each and a single repo with
multiple branches: the branches in multiple clones can be pulled into a
single clone (you now have multiple heads in that repo). You can also
separate things again with 'hg clone -r'.
So it comes down to bookkeeping: with a named branch per major release,
you'll always be able to see where each changeset was first introduced.
With a clone per major release, you'll often have a better overview in
tools such a RhodeCode and hgweb . This is simply because they expose
repositories as the first level of organization.
The two approaches are not exclusive: I'll recommend using a named
branch for the major releases and also create a repository on the server
that holds that release. It's easy to auto-sync that with a cronjob:
hg -R foo-1.x pull -b 1.x foo
That will pull branch 1.x from the foo repo into the foo-1.x repo.
> - Dealing with a big repo. Ours would be about 15G.
That is indeed quite big. Some questions:
* Is that the size in CVS? If not, are you absolutely sure that the CVS
branches were converted correctly into Mercurial branches?
* Is this for a single project, or do you have multiple projects in one
CVS repository? In DVCS, you only have *one* project per repository
(this also helps with the push contention you talked about earlier).
* Do you have binary asserts in the repository? If yes, then consider
removing them or at least externalize them using the largefiles
extension.
> Since it's big, pulling multiple local workspaces would be too
> bandwidth-intentive. (I didn't mention we've got remote sites so BW
> is an issue.) So do people simply do one pull from the centralized
> repo to their local drive and then push/pull to that one and then
> further push to the upstream repo? Probably something in a .hgrc
> that would streamline it but it could still be cumbersome.
For Mercurial itself, I only use one repository. In the .hg/hgrc file I
have defined several paths:
[paths]
default = http://hg.intevation.org/mercurial/crew+main/
default-push = ssh://hg@hg.intevation.org/mercurial/crew/
mpm = http://selenic.com/repo/hg/
hg-i18n = ssh://bb/mg/hg-i18n
So I can 'hg pull' to get the latest combined changes from crew+main, I
can 'hg push' to send my changes over SSH to crew, I can 'hg out mpm' to
compare what I have with what Matt has, and I can 'hg push hg-i18n' to
push changes to our internationalization repository.
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/
More information about the Mercurial
mailing list