When does it make sense to use subrepositories?

Angel Ezquerra angel.ezquerra at gmail.com
Thu Apr 18 21:04:22 UTC 2013


On Thu, Apr 18, 2013 at 8:52 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Thu, 2013-04-18 at 09:36 -0700, v wrote:
>> I've just backed out of several hours' work transitioning a set of related
>> projects from one repository to a group of subrepositories. Having been
>> through this a few times before, I always seem to end up with one of two
>> scenarios:
>>
>> - If changes to one project are usually associated with changes to other
>> related projects, put them in the same repository.
>> - Otherwise, use separate repositories, and handle the relationship with you
>> favourite dependency management solution. If you really want to do this with
>> your SCM, then use guestrepos.
>>
>> I do acknowledge that subrepos are now declared "a feature of last resort",
>> but I'm wondering what this situation is. Someone paid for them to be
>> developed, so they must be useful somewhere!
>
> Most people are unclear on the fundamental purpose of version control.
> They think of VCSes as first and foremost a tool to synchronize code
> between developers. But this is wrong; it is a description of rsync. The
> primary purpose of an VCS is to precisely track old states of the
> project for future reference.
>
> With that in mind, ask yourself the question:
>
> "Do I need to be able to reproduce an exact combination of independent
> projects from the past in the future?"
>
> In other words, "do I need _version control_ on my combination of
> projects?"
>
> If the answer is yes, your options are:
>
> - check everything into one repo and hope you don't have to merge in
> "upstream" changes often
> - use subrepos

I could not have said it better! I think subrepos cover the
requirement that Matt mentioned pretty well, and that is a very
important and common requirement.

I'm very happy to see that Matt did not immediately refer to them as a
feature of last resolt. Subrepos have their set of rough edges (which
I hope we can polish) but they solve a real problem and they solve it
pretty well. What other, better way is there to link specific
revisions of your own code with specific revisions of other people's
code?

In fact this is why I do not understand why subrepos get such a bad
rap, and from core developers no less!

At work we have been using subrepos for years now. We often need to
use subrepos when a project has a dependency on some module which is
usually developed by another team. In those cases the other team
usually creates an "SDK" repository with a set of libraries that we
have to use. We include their "SDK" in our project and then we can
keep track of which SDK version we use on each of our project
revisions. In this way we can always go back and reproduce any
previous build.

That is why we started using subrepos even before TortoiseHg was
subrepo aware (that is one of the main things that led me to start
contributing to TortoiseHg in the first place). Even then they worked
fine and now they work much better. We have people that are really not
very interested in version control which use them every day without
much of a problem.

That being said I suspect that the subrepo experience is much better
if you use TortoiseHg. The mercurial command line is not very subrepo
aware (the biggest example is the fact that hg status almost ignores
them). Hopefully we can move things forward and improve things on this
regard soon.

As for the rough edges, there are a few, but I think they can be fixed:

1. poor command line support
2. subrepos are eternal (they cannot be removed from the workspace,
and they cannot be renamed)
3. merging subrepos is weird (it would often be desirable to be able
to chose a given version of a subrepo when there is a merge conflict,
instead of merging the subrepo contents)
4. it is not possible to declare "read only" subrepos.
5. subrepos are only pulled on update.
6. it is tedious to update subrepos to another revision.

None of these issues is a huge deal (although #2 is quite annoying).
The last issue in particular seems to bother people who want to use
subrepos as a way to keep track of the head of the repository of
another module (which I think is the key feature of the guestrepos
extension). I think we could improve this by bundling the onsub
extension with mercurial, and by adding a way to define a default
"update" target for subrepos (in the .hgsub file), which you could
then trigger by running "hg update --subrepos" or something like that.

Anyway, I just wanted to make it clear that subrepos are not
universally disliked and that they are a useful feature that solves a
real problem.

Cheers,

Angel



More information about the Mercurial mailing list