Mercurial in a corporate environment (migration from CVS)
Stephen Morton
stephen.c.morton at gmail.com
Wed Jan 11 21:35:09 UTC 2012
Thank you Kevin, Dave, Paul and Felix for your helpful replies.
I don't think it makes much sense to do a "see comments below" so I'll just
start fresh. I've taken a little while to read your replies, read up on the
various things you've suggested, write some questions, RTFM and eliminate
the questions that are answered therein, and here's what I've got left.
Some clarifications based on the useful advice:
- Our repository is really not splittable. Right now people check the
whole repository out; and the whole thing, as a consistent changeset, is
required to compile.
- Large files. There are a) some large (for code) code files, and b)
quite a few large-ish binary files of 1-7 MB without many revisions, and c)
a small number of binary files in the 1-7 MB range with lots of revisions.
On the CVS server, the sum of all revisions of b) are about 10-50 MB each,
with a small number up to 375 MB, and the two files I'm thinking of for c)
are 300 MB and 2 GB for all revisions. None of these are intermediate
objects per se. The binary files are either hardware image binaries, or
certain tools binaries that must be kept in lock-step with the code and are
therefore are in the repository.
- Windows line endings aren't really a problem. Our Windows environment
works well with Unix line endings. (No really, it's mostly cygwin tools or
tools we've compiled ourselves.)
- At least in the conceivable future, everybody will be committing, not
a small inner circle of integrators.
New assumptions
- Branches. Branches will be handled by clones (not named branches) for
major releases. But temporary development branches can be named branches.
We rarely have many devlopment branches --though with Mercurial we may have
more as they're easier to implement-- and they always merge back into the
main trunk, though I suppose their numbers would slowly add up over time.
My rationale is that a designer's "hg push" could push changes to all
branches which could be a mistake. And named branches would cause massive
history and repository bloat which seems to be a problem for Mercurial. And
having separate repositories will seriously reduce the contention on pushes.
- Diffing. With cloned branches, will need the rdiff extension for
diffing *files* between releases.
- Designer repositories. In order to conserve network BW, and save time,
designers will likely make multiple repositories by cloning existing local
repositories. (Perhaps in the future, they will use MQ, but for now this
more closely mimics existing practice and avoids recompilation when
context-switching at work.) The trouble is that the cloned repos will by
default have their parent as the other local repo. People don't want to do
a double-push just to put their changes to the server. Or double-pull just
to get the latest changes. So I'll probably need to make an
extension/script+alias to do a clone followed by changing the paths.default
value in the hgrc to point back to the central server. I'm surprised there
isn't some existing option to do this.
- Workflow.
- Designers will have multiple local repositories. At least one for each
customer release branch (for back-porting bugs) each of which is a cloned
repository on the central server.
- If they have more than one repository per release, they'll be
encouraged to make local clones using the extension/script+alias as
mentioned above.
- Many designers currently do the equivalent of an "hg pull -u" and
rebuild by cron job nightly. I imagine some will continue to do this.
- Designers will make their changes, committing often as per DCVS
practice, and eventually, after extensive testing, be ready to push.
- When they push, they're almost guaranteed to get a "push creates
new remote heads" error. They'll do a "hg pull --rebase".
- (Optional, depending on designer risk tolerance... They'll have to
keep eagle eyes looking at all the "pull --rebase" output looking for
"merging <filename>". If one of their changed files was merged, they'll
diff the automerged files just to be sure that they still look sane.)
- Then "hg push" again.
Questions
- Does the workflow seem reasonable?
- Is there no extension to do a "hg clone L1 L2; hg clone L2 L3" but
make L3 point to L1, not to L2?
- Is there a way to force the user specified in the log to be the
Apache/SSH authenticated username, not just whatever the user has put in
his/her ~/.hgrc file?
- CollapseExtension? Lots of small commits to the local repository may
pollute the main repository. (Remember, we're talking 200 designers here.)
Would it be crazy to provide a means for designers to collapse their
history before pushing? (A colleague of mine thinks it's crazy. I think it
makes some sense, though could potentially cause destruction of work.)
- Large files. Are my big files big enough to merit the large files
extension? It sounds like perhaps the large files extension is not ready
for my project anyway.
Thanks again,
Stephen
On Mon, Jan 9, 2012 at 4:52 PM, Stephen Morton
<stephen.c.morton at gmail.com>wrote:
> I have not found, in all my googling, something like "An administrator's
> guide to implementing Mercurial in a large centralized corporate
> environment." So I've got some questions about how to do it.
>
> Ok, don't flame me. I know I've said a bunch of inflammatory things just
> there. I know about DVCS. Hear me out.
>
> I'm an engineer working in the small tools team of a large-ish group. We
> have some 200+ designers and 200+ testers and are using CVS and I'd like to
> move us to Mercurial, and I have the power and support to do this. We have
> an offshoot project that would be a perfect project to incrementally
> prototype our final solution on before migrating the rest of the group, and
> 10 years of CVS history, to Hg in the future.
>
> This is a big project. The CVS checkout of a branch is about 900 MB. The
> complete repo with all history in CVC is about 20 GB. We experimented with
> an import to Git a couple of years ago and the git repo was about 15 GB.
>
> I don't think I even know enough to ask the right questions to ask without
> being flamed. So my first question is "Are there some HOWTOS or something
> that add up to an administrator's guide to implementing Mercurial in a
> large centralized corporate environment?" If not, I've got some questions,
> below.
>
> First, here is what our environment will look like. I know, you're going
> to tell me that once we switch to Hg, we will no longer want to be
> centralized etc etc. And I'll just reply that they only possible way for
> this to get off the ground is for us to start with a workflow that, from
> some level at least, looks very much like CVS.
>
> - We would maintain a centralized repo model where all developers can
> push to. That is not negotiable. People could push and pull from each other
> for development, but it would not be official until pushed to the central
> repo.
> - We have designers working on linux and on Windows. This again is
> non-negotiable.
> - I assume we'd move the entire CVS history to Hg. We could perhaps
> migrate it all but have a sub-repo without the extensive history that
> people normally pull from (to make designer pulls less bandwidth-heavy)
> - Designers will always need workspaces from at least 4 different
> branches on their machines. (Gotta support the customers and fix any bugs!)
>
>
> My uninformed questions are:
>
> - You cannot push if your *repository *is stale, which is different
> from CVS where you just can't commit a *file *which is stale. How do
> other large organizations find that this works for them where people are
> pushing literally every minute?
> - How do you make hooks or something else to disallow certain advanced
> repo management commands (e.g. tagging, branching on the central repo, "hg
> push -f" and creating two heads, revert, rollback) actually on the server
> as opposed to just saying "you shouldn't do that"? (People can do what they
> want on their own repo, but once they push, the history is immutable. And
> advanced maneuvers on the server should be only for the "inner circle" of
> the most senior developers and the tools team.)
> - Major release branches: cloned repos or named branches? (We'd do a
> lot of grafting to backport fixes to customer release branches. Does that
> make a difference? Would it be any more difficult with one method or
> another?) I know the obvious answer is that "Mercurial uses clones for
> branches" but in the CVS environment where branches are so a part of life,
> it seems so crazy to do it another way, and it seems like such a step
> backwards too.
> - Dealing with a big repo. Ours would be about 15G. Since it's big,
> pulling multiple local workspaces would be too bandwidth-intentive. (I
> didn't mention we've got remote sites so BW is an issue.) So do people
> simply do one pull from the centralized repo to their local drive and then
> push/pull to that one and then further push to the upstream repo? Probably
> something in a .hgrc that would streamline it but it could still be
> cumbersome.
>
>
> Regards,
> Stephen
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20120111/aff3f2a2/attachment-0002.html>
More information about the Mercurial
mailing list