Mercurial in a corporate environment (migration from CVS)
Martin Geisler
mg at aragost.com
Thu Jan 12 14:32:29 UTC 2012
Stephen Morton <stephen.c.morton at gmail.com> writes:
> Thank you Kevin, Dave, Paul and Felix for your helpful replies.
>
> I don't think it makes much sense to do a "see comments below" so I'll
> just start fresh. I've taken a little while to read your replies, read
> up on the various things you've suggested, write some questions, RTFM
> and eliminate the questions that are answered therein, and here's what
> I've got left.
>
> Some clarifications based on the useful advice:
>
> - Our repository is really not splittable. Right now people check
> the whole repository out; and the whole thing, as a consistent
> changeset, is required to compile.
Sounds fair enough then.
> - Large files. There are a) some large (for code) code files, and b)
> quite a few large-ish binary files of 1-7 MB without many revisions,
These wont be a problem: they will be downloaded once and just sit there
on the developers machines.
> and c) a small number of binary files in the 1-7 MB range with lots
> of revisions.
The "problem" with those is the bandwidth used to push and pull them.
Compared to Subversion, say, where users only download the latest
version, Mercurial requires you to download all versions of all files.
When a 7 MB image is changed, it might generate a 5 MB delta against the
last version. So if you have 100 revisions of the file, you suddenly
have 500 MB of data that everybody need to download -- even if they're
only ever interested in the latest version.
> On the CVS server, the sum of all revisions of b) are about 10-50
> MB each, with a small number up to 375 MB, and the two files I'm
> thinking of for c) are 300 MB and 2 GB for all revisions. None of
> these are intermediate objects per se. The binary files are either
> hardware image binaries, or certain tools binaries that must be
> kept in lock-step with the code and are therefore are in the
> repository.
These files sound like prime candidates for being externalized with the
largefiles extension. The essense of the extension is that you can
version small "standin" files in Mercurial and only download the bigger
files as needed. So 'hg update' may then need a network connection in
order to download any large files referenced by the revision you update
to.
> - Windows line endings aren't really a problem. Our Windows
> environment works well with Unix line endings. (No really, it's
> mostly cygwin tools or tools we've compiled ourselves.)
Good, then don't worry about the eol extension.
> - At least in the conceivable future, everybody will be committing,
> not a small inner circle of integrators.
>
> New assumptions
>
> - Branches. Branches will be handled by clones (not named branches)
> for major releases. But temporary development branches can be named
> branches. We rarely have many devlopment branches --though with
> Mercurial we may have more as they're easier to implement-- and
> they always merge back into the main trunk, though I suppose their
> numbers would slowly add up over time. My rationale is that a
> designer's "hg push" could push changes to all branches which could
> be a mistake. And named branches would cause massive history and
> repository bloat which seems to be a problem for Mercurial. And
> having separate repositories will seriously reduce the contention
> on pushes.
> - Diffing. With cloned branches, will need the rdiff extension for
> diffing *files* between releases.
I don't think you need any special extension. The normal way is to diff
revisions inside the same repository. Remember that the repository for
version 2.0 will also contain the changeset for 1.0. So
hg diff -r 1.0:2.0
will work just fine.
> - Designer repositories. In order to conserve network BW, and save
> time, designers will likely make multiple repositories by cloning
> existing local repositories. (Perhaps in the future, they will use
> MQ, but for now this more closely mimics existing practice and
> avoids recompilation when context-switching at work.)
Ehm, please don't make designers use mq -- it's an advanced extension
and unless your designers are very interested in version control, I
don't think they'll like learning it.
> The trouble is that the cloned repos will by default have their
> parent as the other local repo. People don't want to do a
> double-push just to put their changes to the server. Or double-pull
> just to get the latest changes. So I'll probably need to make an
> extension/script+alias to do a clone followed by changing the
> paths.default value in the hgrc to point back to the central
> server. I'm surprised there isn't some existing option to do this.
It's not a real problem. Plus we assume that developers are comfortable
with opening .hg/hgrc and edit it as needed :) I find that users tend to
have a few long-lived repositories on their machine and so it's easy to
configure the .hg/hgrc files when they're initially set up.
Also, you can have
[paths]
main = http://hg.company/main
in your ~/.hgrc file and then do 'hg push main' in all repositories to
send the changesets to that path.
> - Workflow.
> - Designers will have multiple local repositories. At least one for each
> customer release branch (for back-porting bugs) each of which is a cloned
> repository on the central server.
> - If they have more than one repository per release, they'll be
> encouraged to make local clones using the extension/script+alias as
> mentioned above.
> - Many designers currently do the equivalent of an "hg pull -u" and
> rebuild by cron job nightly. I imagine some will continue to do this.
> - Designers will make their changes, committing often as per DCVS
> practice, and eventually, after extensive testing, be ready to push.
> - When they push, they're almost guaranteed to get a "push creates
> new remote heads" error. They'll do a "hg pull --rebase".
Why use rebase instead of merging like "normal"? I'm a rebase fan
myself, but I feel obliged to tell you that it's an extension because it
offers users a bigger risk of messing things up than 'hg merge' does.
> - (Optional, depending on designer risk tolerance... They'll have to
> keep eagle eyes looking at all the "pull --rebase" output looking for
> "merging <filename>". If one of their changed files was merged, they'll
> diff the automerged files just to be sure that they still look sane.)
They'll have to recompile their changes to be sure they still work. The
same applies for 'hg merge', btw, but there a 'hg diff' can actually
better tell you what happened.
> - Then "hg push" again.
For robustness, you should consider letting users push to their own
repository on the server. Since they're the only pusher, there can be no
problems with multiple heads: they just pull from the main repo, merge
locally, and push to their own repo on the server.
An integrator will then pull from the user's repo on the server, merge
as necessary, test, and push to the main repo.
> Questions
>
> - Does the workflow seem reasonable?
> - Is there no extension to do a "hg clone L1 L2; hg clone L2 L3" but
> make L3 point to L1, not to L2?
No such extension -- while some tutorials make it sound like you'll be
doing 'hg clone' all day long, I don't think that's the case.
> - Is there a way to force the user specified in the log to be the
> Apache/SSH authenticated username, not just whatever the user has
> put in his/her ~/.hgrc file?
> - CollapseExtension? Lots of small commits to the local repository
> may pollute the main repository. (Remember, we're talking 200
> designers here.) Would it be crazy to provide a means for designers
> to collapse their history before pushing? (A colleague of mine
> thinks it's crazy. I think it makes some sense, though could
> potentially cause destruction of work.)
Collapsing history means editing history. This in turn means enabling
more or less advanced extensions and it means that the users need to
really understand how the graph works in Mercurial. Collapsing a linear
run of changesets is very safe, though, and unlike most other history
rewriting tricks, there is no risk of merge conflicts.
However, there is still the risk that the user has forgotten that he has
pushed the changesets to another local clone. Now he suddenly have
A--B--C--X--Y in one clone and a big ABC changeset in the other. How to
get X--Y moved over to ABC? It's not difficult, but it requires more
knowledge and more extensions.
> - Large files. Are my big files big enough to merit the large files
> extension? It sounds like perhaps the large files extension is not
> ready for my project anyway.
There has certainly been quite a number of bugs in the largefiles
extension. There are still annoying corner cases and weirdness here and
there. However, it's getting better and it's the best we have to offer
right now.
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/
More information about the Mercurial
mailing list