D8244: copies: fix the changeset based algorithm regarding merge

marmoute (Pierre-Yves David) phabricator at mercurial-scm.org
Thu Mar 5 18:15:09 UTC 2020


marmoute created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  In 99ebde4fec99 <https://phab.mercurial-scm.org/rHG99ebde4fec9967c2e80f85702129abf874eefe74>, we changed the list of files stored into the `files` field.
  This lead to the changeset centric copy algorithm to break in various merge
  situation involving merge.
  
  We update the situation with more details about which revision introduces rename
  information. This help use making the right decision in case of merge.
  
  We are now running a more comprehensive suite of test with include this kind of
  situation. The behavior differ slightly from the filelog based in a couple of
  instance. There is mostly two distinct cases:
  
  1. there are conflicting rename information in a merge (different rename history
  
  on each side). In this case the filelog based implementation arbitrarily pick a
  side based on the file-revision-number. So it depends on a local factor. The
  changeset centric algorithm will use a deterministic approach, by picking the
  information coming from the first parent of the merge. This is stable across
  different clone.
  
  2. rename information related to file that exist in both source and destination.
  
  The filelog based implementation do not even try to detect these, however the
  changeset centric one get them for "free" (it is simpler to detect them than
  not).
  
  The new implementation focus on correctness. Performance improvement will come
  later.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8244

AFFECTED FILES
  mercurial/copies.py
  tests/test-copies-chain-merge.t

CHANGE DETAILS

diff --git a/tests/test-copies-chain-merge.t b/tests/test-copies-chain-merge.t
--- a/tests/test-copies-chain-merge.t
+++ b/tests/test-copies-chain-merge.t
@@ -1,3 +1,5 @@
+#testcases filelog compatibility sidedata
+
 =====================================================
 Test Copy tracing for chain of copies involving merge
 =====================================================
@@ -6,6 +8,7 @@
 are involved. It cheks we do not have unwanted update of behavior and that the
 different options to retrieve copies behave correctly.
 
+
 Setup
 =====
 
@@ -18,6 +21,22 @@
   > logtemplate={rev} {desc}]\n
   > EOF
 
+#if compatibility
+  $ cat >> $HGRCPATH << EOF
+  > [experimental]
+  > copies.read-from = compatibility
+  > EOF
+#endif
+
+#if sidedata
+  $ cat >> $HGRCPATH << EOF
+  > [format]
+  > exp-use-side-data = yes
+  > exp-use-copies-side-data-changeset = yes
+  > EOF
+#endif
+
+
   $ hg init repo-chain
   $ cd repo-chain
 
@@ -672,17 +691,26 @@
        0       4 0dd616bc7ab1 000000000000 000000000000
        1      18 6da5a2eecb9c 000000000000 000000000000
        2      19 eb806e34ef6b 0dd616bc7ab1 6da5a2eecb9c
+
+# Here the filelog based implementation is not looking at the rename
+# information (because the file exist on both side). However the changelog
+# based on works fine. We have different output.
+
   $ hg status --copies --rev 'desc("a-2")' --rev 'desc("mAEm-0")'
   M f
+    b (no-filelog !)
   R b
   $ hg status --copies --rev 'desc("a-2")' --rev 'desc("mEAm-0")'
   M f
+    b (no-filelog !)
   R b
   $ hg status --copies --rev 'desc("e-2")' --rev 'desc("mAEm-0")'
   M f
+    d (no-filelog !)
   R d
   $ hg status --copies --rev 'desc("e-2")' --rev 'desc("mEAm-0")'
   M f
+    d (no-filelog !)
   R d
   $ hg status --copies --rev 'desc("i-2")' --rev 'desc("a-2")'
   A f
@@ -692,6 +720,18 @@
   A f
     b
   R b
+
+# From here, we run status against revision where both source file exists.
+#
+# The filelog based implementation picks an arbitrary side based on revision
+# numbers. So the same side "wins" whatever the parents order is. This is
+# sub-optimal because depending on revision numbers means the result can be
+# different from one repository to the next.
+#
+# The changeset based algorithm use the parent order to break tie on conflicting
+# information and will have a different order depending on who is p1 and p2.
+# That order is stable accross repositories. (data from p1 prevails)
+
   $ hg status --copies --rev 'desc("i-2")' --rev 'desc("mAEm-0")'
   A f
     d
@@ -699,7 +739,9 @@
   R d
   $ hg status --copies --rev 'desc("i-2")' --rev 'desc("mEAm-0")'
   A f
-    d
+    d (filelog !)
+    b (compatibility !)
+    b (sidedata !)
   R b
   R d
   $ hg status --copies --rev 'desc("i-0")' --rev 'desc("mAEm-0")'
@@ -709,7 +751,8 @@
   R b
   $ hg status --copies --rev 'desc("i-0")' --rev 'desc("mEAm-0")'
   A f
-    a
+    a (filelog !)
+    b (no-filelog !)
   R a
   R b
 
@@ -733,21 +776,25 @@
   R h
   $ hg status --copies --rev 'desc("b-1")' --rev 'desc("mBFm-0")'
   M d
+    h (no-filelog !)
   R h
   $ hg status --copies --rev 'desc("f-2")' --rev 'desc("mBFm-0")'
   M b
   $ hg status --copies --rev 'desc("f-1")' --rev 'desc("mBFm-0")'
   M b
   M d
+    i (no-filelog !)
   R i
   $ hg status --copies --rev 'desc("b-1")' --rev 'desc("mFBm-0")'
   M d
+    h (no-filelog !)
   R h
   $ hg status --copies --rev 'desc("f-2")' --rev 'desc("mFBm-0")'
   M b
   $ hg status --copies --rev 'desc("f-1")' --rev 'desc("mFBm-0")'
   M b
   M d
+    i (no-filelog !)
   R i
 
 The following graphlog is wrong, the "a -> c -> d" chain was overwritten and should not appear.
@@ -777,9 +824,15 @@
 Unlike in the 'BD/DB' cases, an actuall merge happened here. So we should
 consider history and rename on both branch of the merge.
 
+One side of the merge have a long history with rename. The other side of the
+merge point to a new file with a smaller history. Each side is "valid".
+
+(and again the filelog based algorithm only explore one, with a pick based on
+revision numbers)
+
   $ hg status --copies --rev 'desc("i-0")' --rev 'desc("mDGm-0")'
   A d
-    a
+    a (filelog !)
   R a
   $ hg status --copies --rev 'desc("i-0")' --rev 'desc("mGDm-0")'
   A d
@@ -840,7 +893,8 @@
 
   $ hg status --copies --rev 'desc("i-0")' --rev 'desc("mFGm-0")'
   A d
-    a
+    h (no-filelog !)
+    a (filelog !)
   R a
   R h
   $ hg status --copies --rev 'desc("i-0")' --rev 'desc("mGFm-0")'
@@ -854,15 +908,19 @@
   M d
   $ hg status --copies --rev 'desc("f-1")' --rev 'desc("mFGm-0")'
   M d
+    i (no-filelog !)
   R i
   $ hg status --copies --rev 'desc("f-1")' --rev 'desc("mGFm-0")'
   M d
+    i (no-filelog !)
   R i
   $ hg status --copies --rev 'desc("g-1")' --rev 'desc("mFGm-0")'
   M d
+    h (no-filelog !)
   R h
   $ hg status --copies --rev 'desc("g-1")' --rev 'desc("mGFm-0")'
   M d
+    h (no-filelog !)
   R h
 
   $ hg log -Gfr 'desc("mFGm-0")' d
diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -183,10 +183,27 @@
     * p1copies: mapping of copies from p1
     * p2copies: mapping of copies from p2
     * removed: a list of removed files
+    * ismerged: a callback to know if file was merged in that revision
     """
     cl = repo.changelog
     parents = cl.parentrevs
 
+    def get_ismerged(rev):
+        ctx = repo[rev]
+
+        def ismerged(path):
+            if path not in ctx.files():
+                return False
+            fctx = ctx[path]
+            parents = fctx._filelog.parents(fctx._filenode)
+            nb_parents = 0
+            for n in parents:
+                if n != node.nullid:
+                    nb_parents += 1
+            return nb_parents >= 2
+
+        return ismerged
+
     if repo.filecopiesmode == b'changeset-sidedata':
         changelogrevision = cl.changelogrevision
         flags = cl.flags
@@ -233,7 +250,8 @@
                 p1copies = {}
                 p2copies = {}
                 removed = []
-            return p1, p2, p1copies, p2copies, removed
+
+            return p1, p2, p1copies, p2copies, removed, get_ismerged(rev)
 
     else:
 
@@ -242,7 +260,7 @@
             ctx = repo[rev]
             p1copies, p2copies = ctx._copies
             removed = ctx.filesremoved()
-            return p1, p2, p1copies, p2copies, removed
+            return p1, p2, p1copies, p2copies, removed, get_ismerged(rev)
 
     return revinfo
 
@@ -256,6 +274,7 @@
     revinfo = _revinfogetter(repo)
 
     cl = repo.changelog
+    isancestor = cl.isancestorrev  # XXX we should had chaching to this.
     missingrevs = cl.findmissingrevs(common=[a.rev()], heads=[b.rev()])
     mrset = set(missingrevs)
     roots = set()
@@ -283,10 +302,14 @@
     iterrevs.update(roots)
     iterrevs.remove(b.rev())
     revs = sorted(iterrevs)
-    return _combinechangesetcopies(revs, children, b.rev(), revinfo, match)
+    return _combinechangesetcopies(
+        revs, children, b.rev(), revinfo, match, isancestor
+    )
 
 
-def _combinechangesetcopies(revs, children, targetrev, revinfo, match):
+def _combinechangesetcopies(
+    revs, children, targetrev, revinfo, match, isancestor
+):
     """combine the copies information for each item of iterrevs
 
     revs: sorted iterable of revision to visit
@@ -305,7 +328,7 @@
             # this is a root
             copies = {}
         for i, c in enumerate(children[r]):
-            p1, p2, p1copies, p2copies, removed = revinfo(c)
+            p1, p2, p1copies, p2copies, removed, ismerged = revinfo(c)
             if r == p1:
                 parent = 1
                 childcopies = p1copies
@@ -319,9 +342,12 @@
                 }
             newcopies = copies
             if childcopies:
-                newcopies = _chain(newcopies, childcopies)
-                # _chain makes a copies, we can avoid doing so in some
-                # simple/linear cases.
+                newcopies = copies.copy()
+                for dest, source in pycompat.iteritems(childcopies):
+                    prev = copies.get(source)
+                    if prev is not None and prev[1] is not None:
+                        source = prev[1]
+                    newcopies[dest] = (c, source)
                 assert newcopies is not copies
             for f in removed:
                 if f in newcopies:
@@ -330,7 +356,7 @@
                         # branches.  when there are no other branches, this
                         # could be avoided.
                         newcopies = copies.copy()
-                    del newcopies[f]
+                    newcopies[f] = (c, None)
             othercopies = all_copies.get(c)
             if othercopies is None:
                 all_copies[c] = newcopies
@@ -338,21 +364,55 @@
                 # we are the second parent to work on c, we need to merge our
                 # work with the other.
                 #
-                # Unlike when copies are stored in the filelog, we consider
-                # it a copy even if the destination already existed on the
-                # other branch. It's simply too expensive to check if the
-                # file existed in the manifest.
-                #
                 # In case of conflict, parent 1 take precedence over parent 2.
                 # This is an arbitrary choice made anew when implementing
                 # changeset based copies. It was made without regards with
                 # potential filelog related behavior.
                 if parent == 1:
-                    othercopies.update(newcopies)
+                    _merge_copies_dict(
+                        othercopies, newcopies, isancestor, ismerged
+                    )
                 else:
-                    newcopies.update(othercopies)
+                    _merge_copies_dict(
+                        newcopies, othercopies, isancestor, ismerged
+                    )
                     all_copies[c] = newcopies
-    return all_copies[targetrev]
+
+    final_copies = {}
+    for dest, (tt, source) in all_copies[targetrev].items():
+        if source is not None:
+            final_copies[dest] = source
+    return final_copies
+
+
+def _merge_copies_dict(minor, major, isancestor, ismerged):
+    """merge two copies-mapping together, minor and major
+
+    In case of conflict, value from "major" will be picked.
+
+    - `isancestors(low_rev, high_rev)`: callable return True if `low_rev` is an
+                                        ancestors of `high_rev`,
+
+    - `ismerged(path)`: callable return True if `path` have been merged in the
+                        current revision,
+    """
+    for dest, value in major.items():
+        other = minor.get(dest)
+        if other is None:
+            minor[dest] = value
+        else:
+            new_tt = value[0]
+            other_tt = other[0]
+            if value[1] == other[1]:
+                continue
+            # content from "major" wins, unless it is older
+            # than the branch point or there is a merge
+            if (
+                new_tt == other_tt
+                or not isancestor(new_tt, other_tt)
+                or ismerged(dest)
+            ):
+                minor[dest] = value
 
 
 def _forwardcopies(a, b, base=None, match=None):



To: marmoute, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list