Alternatives to subrepo for subprojects

Haase, Peter Peter.Haase at draeger.com
Wed Jul 11 09:08:01 UTC 2012


Hi Paul,

Thanks for the info and sorry for the delayed answer!

It seems to me that your environment and requirements regarding sub- or 
superrepositories are quite similar to ours.

I like your idea (if I understood you correctly) to explicitly define the 
origins of the subrepositories (which are independent and self-contained 
repositories by themselve) by using an absolute path/URI.

>From my point of view one misconecption of the existing subrepository 
extension (for our use cases) is that absoulte paths are badly supported. Two 
issues:

a) the absoulte pathes are stored in the history (which makes no sense)
b) if a developer pushes changes to another developer (peer-to-peer) his 
changes to subrepositories are silently pushed to the central server

I had in mind to store the absolute paths to subrepos by the local hgrc 
configuration file and distribute this information by the projectrc extension. 
Yet I haven't understood how you want to fulfill your requirement 'Update to a 
previous top-level version where the *absolute* paths to the subrepo have 
changed, and *not* have the world break'. Are the mapping and configuration 
files in your example are tracked by the super repo? If so, what is the 
advantage of tracking the URI in the history of the super repo? How do you 
enable to update to a previous version without breaking the project when an 
URI has changed in between (e.g. the server address of the central mercurial 
server has changed)?

> What can we do with our extension?
>
> "pull" - each remote has hg pull run on it.
> "push" - each remote is pushed to
> "sync" - each repo has an hg up cs_id executed on it.
> "freeze" - the nodeid of the current state is saved into a
> normal definition file (useful for release processes).
> "summary" - A human-readable summary of the state is printed
> onto stdout.

What do you mean by remote? The original repository a subrepository is 
referring to?

We are really curious about your extension. Do you have plans for publishing 
it?

Regards,
Peter


> -----Original Message-----
> From: paul_nathan at selinc.com [mailto:paul_nathan at selinc.com]
> Sent: Mittwoch, 20. Juni 2012 20:39
>
> > From: "Haase, Peter" <Peter.Haase at draeger.com>
>
> [snip]
>
> >  - The repository collection (that represents the code base of a
> > module based application) can be provided by the Mercurial
> web server
> > that provides also the real repositories
> >  - We need a precise and auditable reproducabilty of all
> > module/repository revisions that goes into the application. By its
> > SHA1 based changeset identifier Mercurial brings a lot to
> provide such feature.
> >
> > So what I have in mind should probably be called a parent
> repository
> > extension. However, I haven't thought well through it now - so I'm
> > open for other ideas.
> >
> > Peter
> >
>
> Hi Peter ( & other interested parties),
>
> I've been OK'd to release some information about what we're
> doing here at SEL.
>
> Quick background, first. SEL makes components for electrical
> systems; a big focus is the power grid.  We *really* care
> about quality. Part of how we achieve quality is a 10-year
> warranty. This means our firmware has to be *reproduced* from
> 10 years ago should a defect be found. Another part of
> quality is minimizing misuse of tools when feasible.
>
> This feeds into two different areas related to subrepos.
> (1) They are easy to misuse and trivial errors will propagate
> with ill effect [3].
> (2) They assist in reproducing state, but only when you commit.
>
> We've been developing a large product with 100+ subrepos
> using named branches and upwards of 15 users (the numbers
> have shifted here and there).  These subrepos are likely to
> be used in multiple projects/products, some are third party,
> some are frozen, some under active development.
>
> So we have to account for multiple users, multiple
> components, multiple projects, and multiple products.
> This is actually a fairly complex scenario.  However, we
> believe that this is common in enterprise software and
> similar situations.
>
> We have, unfortunately, found subrepos to be easy to misuse
> and started thinking about what to do.
>
> After careful review of mpm's comments on subrepos, we
> decided that what we wanted from our component repos was not
> what the hg project direction was. We understand the subrepo
> direction as to view a superrepo to be a singular project,
> and subrepos are part and parcel of that one singular project
> [1] [2]. Therefore, we believe that our direction for
> components is  sufficiently divergent that it merits an
> extension, rather than subverting existing behavior via options.
>
>
> In particular, we want to be able to view subrepos as
> collections of repositories.
>
> This means that:
>
> * We do not want normal actions on the top repo (superrepo)
> to recurse in *any* sense onto the subrepos -
>
> * - with special emphasis on ensuring that merges at the
> superrepo level do _*NOT*_ merge into subrepos.
>
> * Very loose coupling between one subrepo and the next -
> linkages between superrepo and changeset hashes are really
> only useful around release time - the rest of the time, "tip"
> or "branchname" would make us happy.
>
> Improvements that are very handy are to be able to:
>
> * Be able to specify branches or other identifiers to "lock"
> to as well as raw changeset IDs.
> Users of base ClearCase will notice the similarity to the
> config spec files. For larger-scale projects with many
> subrepos under continuous development, committing changeset
> IDs is simply imposing a burden of updating on users who only
> want the latest on a given named branch.
>
> * Update to a previous top-level version where the *absolute*
> paths to the subrepo have changed, and *not* have the world
> break. Having hgsub and hgsubstate essentially 'break' hg for
> non-guru users when an invalid reference is formed is
> basically not ok. Users will make mistakes, and hg is not
> tolerant of that in this area. Therefore, we are attempting
> to make a system that broadens the legal actions users can take.
>
>
> We are finalizing the internal development of an extension
> that does all this.
>
> The extension massages the contents of .hgsub & .hgsubstate
> into a user-specified triple that looks like this:
>
> (URI, local_repository, cs_id) where cs_id denotes some valid
> changeset ID that local_repository must be updated to.
> We have a mapping file symbolic_name->URI, and a definition
> file local_repository -> symbolic_name cs_id.
> Our implementation is careful to decouple these so that if
> the remote URI becomes invalid or a component loses an
> identifier, hg keeps working (unlike subrepos).
>
> What can we do with our extension?
>
> "pull" - each remote has hg pull run on it.
> "push" - each remote is pushed to
> "sync" - each repo has an hg up cs_id executed on it.
> "freeze" - the nodeid of the current state is saved into a
> normal definition file (useful for release processes).
> "summary" - A human-readable summary of the state is printed
> onto stdout.
>
>
> Right now, we're recursing on some of these commands, but
> that may change pending internal use. However, note that
> "normal" hg commands will not recurse; indeed, hg will treat
> these files just like any other. No magic.
>
> Generally, when an error occurs, you will  still be able to
> operate on the components and unwind yourself from
> the error without having to hg up -C -r 'null' at the
> superrepo. In order to do this, we have included flags
> that say, effectively,  "don't worry about the stuff that's
> committed, just look at the current info and run
> with it, ok?".
>
> As an example of a repo that got converted from subrepos to
> this extension:
>
>
> --- 
>
> mapping file.
> # mapping - note that dateutil is a symbolic name here
> dateutil                       = ssh://servername//path/to/dateutil
>
> configuration file.
> # configuration. dateutil is both the symbolic name and the
> directory name. vendor is the branch
> dateutil                       = dateutil
>   vendor
>
> --- 
>
> As a design comment - this approach is designed for
> enterprises with centralized servers where work is
> continually being pushed to an integration branch, with
> *many* repositories having continual work.
> Individuals might have a particular branch they are working
> on, but are needing the latest changes
> from many other repositories... each with various tags/named
> branches/bookmarks/whatevers. This is
> not really intended to be used by individuals or small teams,
> and I personally think it probably
> won't be useful to them. However, I *think* that this will be
> really useful for its designed purpose.
>
> Anyway. We would *like* to release this extension, but at
> present are not ready to. I would be happy to
> discuss this extension and its capabilities further. Any
> feedback would be welcomed.
>
>
> ------------ 
> P.S.
> References that assisted us in understanding the subrepo
> design and plans are, in particular, these:
>
> http://selenic.com/pipermail/mercurial-devel/2012-January/0376
> 75.html
> <http://selenic.com/pipermail/mercurial-devel/2012-January/037
> 675.html>
> http://selenic.com/pipermail/mercurial-devel/2011-April/030141
> .html
> <http://selenic.com/pipermail/mercurial-devel/2011-April/03014
> 1.html>
> http://permalink.gmane.org/gmane.comp.version-control.mercuria
> l.devel/44331
> <http://permalink.gmane.org/gmane.comp.version-control.mercuri
> al.devel/44331>
> http://markmail.org/thread/ca4m34uvj7wb65gl
> <http://markmail.org/thread/ca4m34uvj7wb65gl>
>
>
> ------------ 
> Annotations
>
> [1] An example of this is hg commit. Contrast hg commit with
> hg diff. This particular divergence
> of behavior has caused bugs and 'I just did WHAT to my repo,
> gotta call Paul' events. I notice in my
> 2.2.1 help subrepos that commit no longer defaults to
> committing recursively.  This is good! However,
> merge still does.
>
> [2] We have smaller projects which do use subrepos as
> partitions of a singular project, and it works excellently.
>
> [3] Hg merge recurses. Suppose you have two very different
> versions of a third party component on two different
> branches of your superrepo, and you want to merge those
> branches. hg will attempt to merge the two versions of
> the component. This is a catastrophic error. Unfortunately,
> it's occurred for us a few times. Fortunately, we
> were able to sort things out. We don't want this behavior to
> actually induce a bug in-product for us....
>
>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5090 bytes
Desc: not available
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20120711/d4498ba6/attachment-0002.bin>


More information about the Mercurial mailing list