A case for subrepos with absolute URLs

Arne Babenhauserheide arne_bab at web.de
Mon Dec 12 06:12:09 UTC 2011


Am Montag, 12. Dezember 2011, 03:15:41 schrieb Mads Kiilerich:
> Arne Babenhauserheide wrote, On 12/11/2011 10:29 PM:
> > So the problem does not originate in absolute URLs, these just show the
> > problem. It originates in the strong coupling.
> > 
> > Because of that I want to argue, that Mercurial should not discourage
> > the use of absolute URLs in subrepos, but rather reduce the consistency
> > requirement over subrepo boundaries. A few ideas:

> I think you have some valid points, but I also think you connect the
> dots incorrectly and draw an incorrect picture.
> 
> As you point out the strong coupling is an issue not related to absolute
> paths. Issues with strong coupling is for example tracked (or at least
> reported) on http://mercurial.selenic.com/bts/issue2520 "Impossible to
> transition from a bad .hgsubstate", but there are also other similar
> open issues. It seems like this is the main topic of the mail. I don't
> have many comments to that before we have more specific proposals or
> patches.

The problem dessribed in the bug is one I suffered, too: commit in the subrepo, 
commit in the parent repo, realize that the commit in the subrepo was 
incorrect: rollback, commit again. Voilà: Broken revision. Now pull in the 
parent repo and you need strip.

Non-recursing commit mitigated that problem a bit.

> I'm a bit puzzled why absolute URLs also are mentioned in the subject
> and throughout, but I will take the bait and comment a bit on that:
> 
> First of all: A consequence of using absolute subrepo urls as it is now
> is that it essentially makes Mercurial a centralized VCS. I agree that
> there are some valid use cases for centralized VCS, and absolute urls
> for subrepos might be a good solution in these cases. But the primary
> use case for Mercurial is as a distributed VCS, so in general it is a
> bad advice to use subrepos with absolute urls.

What exactly does the difference mean here? We have two pieces of information 
in a path in .hgsub: “Here it is” and “here you can get it”. For someone who 
clones the repo, the second part is mostly redundant with relative URLs, 
because he could just as well clone from the in-repo clone. 

If you clone from the repo, the subrepo contains the source in the .hg/hgrc, 
so it also does not need the subrepo URL. For these,  .hgsub with only 
subpath= would suffice.

We only need it if we update to a subrepo and cannot connect to the repo we 
cloned from to get its subrepo. And in that case, we need a centralized source 
anyway - or all subrepos for all revisions stored inside a cloned repo.

Is there anything for which we really need the second part of the repo in a 
truly decentral viewpoint - ignoring the current implementation and only 
looking at the information?

> Ok, you propose to redefine what a subrepo source is (or repeat some
> previous proposals made in a different age). That might mitigate the bad
> advice but it also leaves us with a moving target as topic for the
> discussion - that is hard to reason about.
> 
> Yes, external repos used as subrepos will have an upstream with an
> absolute url. I agree that it might be convenient to have that url
> tracked in the repo in some way, but that doesn't mean that the absolute
> url should be used as subrepo source. (I also think it is hard to
> imagine a well organized work flow where it is relevant for more than 1
> or 2 developers to introduce new upstream revisions. Having the upstream
> urls in a README might not be the worst solution.)
> 
> I think subpaths (as described on
> http://mercurial.selenic.com/wiki/Subrepository#Use_.27trivial.27_subrepo_pa
> ths_where_possible ) provides a reasonably elegant solution to many problems
> in this area, not only a workaround.

That advertises using trivial paths, where the second part of the subrepo 
definition is completely redundant. And any case where it is not redundant is 
no longer really decentral or no longer safe (as it relies on sources which 
might go away). 

Also it does not solve the issue that the subrepo might be exchanged, which in 
the trivial case renders all previous revisions inaccessible.

So maybe a subrepo should always be treated as trivial path, except if that 
path does not provide the required changesets for the substate. 

And in that case, we need some fallback-mechanism anyway. And that could be 
the second part of the subrepo definition. It would change the .hgsub definition 
to: 

subrepo = <fallback>

> Anyway: This is mainly a matter of making it possible to control what
> path (default or something else) is put in .hg/hgrc of subrepo clones.
> This path is rarely used anyway, so some .hgsub syntax for controlling
> that wouldn't hurt ... but I think it will add complexity for no benefit.

I don’t think so. I rather think that it is a general flaw in requiring another 
repository to be accessible to be able to update to a given revision. And that 
flaw hurts quite many people in real-life Mercurial usage - and it is 
unexpected given that no other part of core Mercurial exibits the behaviour 
(largefiles does the same, which nags me, too). 

I call this a case for subrepos with absolute URLs, because they are blamed 
for a problem, which they aren’t responsible for - and they are discouraged 
due to that. 

Since subrepos with relative URLs can have the same problem, Mercurial’s 
subrepo support should be improved to be able to work well with absolute URLs 
- which will also fix the problems for relative subrepo paths.

Best wishes, 
Arne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20111212/a84954e7/attachment.asc>


More information about the Mercurial-devel mailing list