Counterintuitive tag behaviour (broken design?)

Johan Herland johherla at online.no
Wed Mar 14 17:45:14 UTC 2007


On Tuesday 13 March 2007, Matt Mackall wrote:
> [...]

Thanks a lot for a good and informative answer to my first post. Below, 
you'll find some more of my thoughts in response to the list of 
requirements given.

> So let's look a bit at the requirements:
>
> a) tags need to be distributed in parallel with the rest of the
> history b) conflicts between local and remote tags should get
> resolved in merges c) it must be possible to determine who created
> tags
> d) it must be possible to move tags
> e) it must be possible to remove tags
> f) because tags can change, it must be possible to determine what the
>    tags were at a specific time in the project history
>
> And now we add:
>
> g) tags should not change without a tagging event (eg. a commit on a
>    branch shouldn't resurrect old tags)
> h) any change we make should not break the existing system too
> horribly

I'd revise (g) into an even stronger statement:

g) The only way a tag may change is through an explicit operation
   directly on that tag. For example, only the following operations may
   change the tag "foo":
   g.1.1) Explicitly moving or removing tag "foo".
   g.1.2) Merging tag definitions from different repositories where
          tag "foo" is different.

   Conversely, any other operation may not change a tag. For example,
   the following operations may never change a tag "foo":
   g.2.1) Any activity not touching tag definitions
          (e.g. commits/updates/merges on files (except .hgtags)).
   g.2.2) Any activity on tags unrelated to "foo".
          (e.g. creating/moving/removing tag "bar", or even
          editing .hgtags without changing or reordering any of
          the "foo" entries)


Furthermore, I'll throw the following requirements into the fray:

i) Tags are global (to the repository). I.e. the result of "hg tags" and
   "hg up -C <tag>", etc. must be independent of the current working
   copy. This concept has been referred to as the "global tag context"
   in this discussion.

j) If two users independently create a tag "foo" pointing to different
   changesets in their respective repositories, merging the two
   repositories MUST result in a conflict that cannot be _automatically_
   resolved. This is a feature.


Now, the current solution is to allow full revision control on the tag 
definitions (by putting .hgtags in the repository as a "regular" file). 
This elegantly allows us to track tag definitions through time, which 
is clearly what we want (i.e. we can easily ask questions like: when 
did a tag change? who did it? for what reason?). However, 
putting .hgtags under full/regular revision control also provides the 
possibility of branching .hgtags. Unfortunately, the concept of 
branches on .hgtags is easily confused with the concept of branched tag 
definitions. I, for one, got these two confused at the start, and I 
believed for a while that a tag could refer to different changesets 
depending on which branch I was on. Of course, I now clearly see that 
branched tag definitions is irreconcilable with requirement (i) (global 
tag context), and that (i) is clearly what we want.

Now, since we (because of (i)) clearly do not want branching tag 
definitions, we must therefore find a way to extract global tag context 
from the existing branches of .hgtags. Currently, we try to resolve 
this problem by defining a robust algorithm for doing the merge of tag 
definitions. This can - by some stretch of the imagination - be seen as 
an attempt to fix a bug in the design itself, the original bug being 
allowing branching of .hgtags in the first place.

The above analysis makes me want to look closer at how the system would 
work if we disallowed branches on .hgtags. First, I must say that 
disallowing branches on a file in the repository that is otherwise much 
treated as a regular file, sounds like an ugly hack, and I do not 
pretend to know the technical difficulties involved in making this 
work. However, let's for a second assume that we could fix these 
problems elegantly. We should then ask how disallowing branches 
on .hgtags affects the above requirements: As far as I can see, (a) 
does not need to be affected. Neither does (c), (d), (e), and (f). (g) 
should be elegantly fulfilled since AFAICS the problems with (g) today 
are caused by having different branches on .hgtags. Also, (i) should be 
automatically resolved by using the most recent revision of .hgtags as 
the only source of global tag context. AFAICS, (j) is a subpoint of (b) 
in this context. We're then left with (b) and (h). To me, it seems that 
(b) must be resolved by merging the tag definitions with a different 
algorithm than the one used to merge "regular" files. The algorithm 
must include ways to allow the user to manually resolve tag conflicts 
(to satisfy (j)). However, AFAICS from the solution currently being 
discussed, something similar is needed anyway, since the "regular" file 
merge algorithm will fail on the current .hgtags format anyway. We're 
then left with (h) which probably will be the real challenge here. How 
do we implement the disallowing of .hgtags branches without modifying 
the existing system too much. At this point, I'm not sure, but I think 
it is worth researching some more.

Finally, there are surely more requirements for desirable tag behaviour 
that we should try to formulate in addition to the 10 above. Let's use 
this discussion to try to enumerate as many as possible. This would 
help us all to get a clearer picture of what we're actually up against.


> Ok, what does this tell us about the design? First, points (f) and
> (c) basically says tags must be version controlled. And this
> basically means it must happen exactly in parallel with the project's
> DAG. Keeping the tag data in .hgtags meets those requirements with
> the added benefit of not adding a second namespace. Also, (b) falls
> nicely out of this, though the actually merging could be friendlier.

I agree that tags MUST be version controlled, but I don't think they 
should be branchable. Branchable tags directly violates (i).

> [...]
>
> This logic is still hard to get right and there will still need to be
> tie-breakers based on what's 'tip-most' to remove ambiguity. So if we
> decide that this is the right approach, we need to a) precisely
> document it for users with examples and b) make sure the
> implementation matches the documentation (aka test cases).

I do not agree that "tip-most"-ness can be used as a tie-breaker. The 
concept of "tip-most" is arbitrary, and easily breaks requirement (g) 
(both original and revised version). If "tip-most"-ness or some other 
arbitrary measure is needed to remove ambiguity, there's too much 
ambiguity in the first place. If a tie-breaker is needed in ambiguous 
cases, I'd much rather prefer one of Alexis' proposed tie-breaker 
algorithms.

However, what I'd REALLY like is to get rid of ALL the ambiguity by only 
allowing one .hgtags per repository (i.e. no branching of .hgtags). 
Sure, we will still need to merge tags between repositories when 
pushing/pulling, but the ambiguous cases that pop up here (at least 
some of them) require human intervention anyway (according to 
requirement (j)), thus we don't need a merge algorithm that resolves 
all ambiguity, it only needs to merge the obvious cases, and leave the 
rest up to humans.


Have fun!

...Johan

-- 
Johan Herland, <johherla at online.no>
www.herland.net



More information about the Mercurial mailing list