[PATCH v2] graft: support grafting across move/copy (issue4028)
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Wed Aug 3 19:28:56 UTC 2016
On 08/02/2016 04:02 PM, Gábor STEFANIK wrote:
> Hi,
>
> -----Original Message-----
>> From: Pierre-Yves David [mailto:pierre-yves.david at ens-lyon.org]
>> Sent: Tuesday, August 02, 2016 1:21 PM
>> To: Gábor STEFANIK <Gabor.STEFANIK at nng.com>; mercurial-
>> devel at mercurial-scm.org
>> Subject: Re: [PATCH v2] graft: support grafting across move/copy (issue4028)
>>
>>
>>
>> On 08/02/2016 11:27 AM, Gábor Stefanik wrote:
>>> # HG changeset patch
>>> # User Gábor Stefanik <gabor.stefanik at nng.com> # Date 1470047695 -7200
>>> # Mon Aug 01 12:34:55 2016 +0200
>>> # Node ID f5220d4f9655dade72857fe310410fb73062bba9
>>> # Parent 73ff159923c1f05899c27238409ca398342d9ae0
>>> graft: support grafting across move/copy (issue4028)
>>
>> There is a lot of things happening in this one patch. You change fairly low
>> level functions having impact in many places. Making this a series of smaller
>> patches each doing a small independent step toward your end goal would
>> make this easier to review and help refine it.
>>
>> For example, your changes to the updates APIs very often reuse an
>> argument we already pass to it. (for 'manifestmerge', 'pta' is often 'pa'; for
>> 'calculateupdates', topo_ancestors is often ancestors[0]). So we could
>> probably leave the main signature untouched in the common case.
>> Only adding a named argument to be used by the special case.
>>
>> Can you send a V3 series with the changes to the low level API in small
>> independent patches ? Each patch should be functional itself.
>
> I will try to sort out the API breakages.
>
> I'm afraid I will have to send a single patch, because I'm using pushgate which AFAIK doesn't support multipart patch series.
> (I'm developing on Windows, so no mutt or pine to send whitespace-undamaged patches in e-mail bodies, and no smtp server for patchbomb.)
> Also, I'm not using mq, and with the talk about it being obsolete and last-resort, I would rather not start using it now.
As people pointed out, pushgate support multiple patch. Thanks for
pointing out that the documentation was confusing.
You do not need MQ to work on multiple changesets, there is other more
modern alternative. For example, you can have a look at the histedit
extension shipped with Mercurial.
I saw you sent a V3 as a single patch, could you slice it in multiple
patch and send the result as a V4? Ping me on IRC if you need help with
the splitting.
[…]
>>> diff --git a/hgext/largefiles/overrides.py
>>> b/hgext/largefiles/overrides.py
>>> --- a/hgext/largefiles/overrides.py
>>> +++ b/hgext/largefiles/overrides.py
>>> @@ -465,11 +465,11 @@
>>> # Finally, the merge.applyupdates function will then take care of #
>>> writing the files into the working copy and lfcommands.updatelfiles #
>>> will update the largefiles.
>>> -def overridecalculateupdates(origfn, repo, p1, p2, pas, branchmerge,
>> force,
>>> - acceptremote, *args, **kwargs):
>>> +def overridecalculateupdates(origfn, repo, p1, p2, pas, pta,
>>> +branchmerge,
>>
>> Note that your naming of the argument here 'pta' is not consistent with the
>> name change in calculteupdates 'topo_ancestor'
>
> Calculateupdates also uses "ancestors" instead of "pas", which is why I exceptionally used "topo_ancestor" instead of "pta" or "cta" there.
That make sense.
> […]
>>> -def _computenonoverlap(repo, c1, c2, addedinm1, addedinm2):
>>> +def _computenonoverlap(repo, c1, c2, addedinm1, addedinm2,
>> silent=False):
>>> """Computes, based on addedinm1 and addedinm2, the files exclusive
>> to c1
>>> and c2. This is its own function so extensions can easily wrap this call
>>> to see what files mergecopies is about to process.
>>> @@ -242,10 +243,10 @@
>>> u1 = sorted(addedinm1 - addedinm2)
>>> u2 = sorted(addedinm2 - addedinm1)
>>>
>>> - if u1:
>>> + if u1 and not silent:
>>> repo.ui.debug(" unmatched files in local:\n %s\n"
>>> % "\n ".join(u1))
>>> - if u2:
>>> + if u2 and not silent:
>>> repo.ui.debug(" unmatched files in other:\n %s\n"
>>> % "\n ".join(u2))
>>> return u1, u2
>>
>> This adds of a 'silent' argument is a good example of something that should
>> go in an independent patch. What is it for? Why do we need it?
>
> We are now calling this function twice when grafting, once for the true common ancestor, and again for the fake one.
> It would be confusing to print the unmatched file list on both passes.
> Since previously we only called it for the fake CA, leave that pass as the one that prints.
>
> Alternative would be to move the debug printouts out of _computenonoverlap into mergecopies, but that would
> make the if/else branching in copies.py much more complicated.
Thanks for the explanation. Extract this in its own changeset with this
explanation as the commit message and that will be perfect.
> […]
>>> @@ -321,6 +322,10 @@
>>> if repo.ui.configbool('experimental', 'disablecopytrace'):
>>> return {}, {}, {}, {}
>>>
>>> + # cta will only differ from ca when grafting or during non-linear
>>> + updates
>>
>> We do not have non-linear update (yet?), you should avoid mentioning it the
>> comments.
>
> While we don't currently have non-linear updates exposed on the UI, it's present in the code (although marred by this bug), and extensions are free to call it - in fact, graft is already doing so.
okay ☺
> […]
>>> + else: # need to recompute this for directory move handling when
>> grafting
>>> + unmatched = operator.add(*_computenonoverlap(repo, c1, c2,
>>> + m1.filesnotin(ma),
>>> + m2.filesnotin(ma), False))
>>
>> Using operator here seems a bit overkill. What about just adding them in an
>> extra line?
>
> That would require defining two large memory-wasting temporary arrays.
I'm a bit confused, Python is going to allocate them anyway, the use of
the add operator is not saving any memory here. Did I missed something?
>
>>
>>> bothnew = sorted(addedinm1 & addedinm2)
>>>
>>> for f in u1:
>>> - checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1)
>>> + checkcopies(c1, f, m1, m2, ca, cta, limit, diverge, copy1,
>>> + fullcopy1, copyfrom, copyto)
>>>
>>> for f in u2:
>>> - checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2, fullcopy2)
>>> + checkcopies(c2, f, m2, m1, ca, cta, limit, diverge, copy2,
>>> + fullcopy2, copyfrom, copyto)
>>>
>>> copy = dict(copy1.items() + copy2.items())
>>> movewithdir = dict(movewithdir1.items() + movewithdir2.items())
>>> fullcopy = dict(fullcopy1.items() + fullcopy2.items())
>>>
>>> + # combine partial copy paths discovered in the previous step
>>> + for f in copyfrom:
>>> + if f in copyto:
>>> + copy[copyto[f]] = copyfrom[f]
>>> +
>>> renamedelete = {}
>>> renamedeleteset = set()
>>> divergeset = set()
>>> @@ -369,10 +389,12 @@
>>> if bothnew:
>>> repo.ui.debug(" unmatched files new in both:\n %s\n"
>>> % "\n ".join(bothnew))
>>> - bothdiverge, _copy, _fullcopy = {}, {}, {}
>>> + bothdiverge, _copy, _fullcopy, _copyfrom, _copyto = {}, {}, {},
>>> + {}, {}
>>
>> At that point we start having enough related variable that it gets hard to
>> follow, I was about to request documentation, but it actually already exists in
>> checkcopies(). We should probably add a small comment pointing to it here.
>
> Will do.
>
> Please note that these are dummy variables for the 2nd set of checkcopies calls, the real ones are a few lines above.
>
>>
>>> for f in bothnew:
>>> - checkcopies(c1, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy)
>>> - checkcopies(c2, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy)
>>> + checkcopies(c1, f, m1, m2, ca, cta, limit, bothdiverge, _copy,
>>> + _fullcopy, _copyfrom, _copyto)
>>> + checkcopies(c2, f, m2, m1, ca, cta, limit, bothdiverge, _copy,
>>> + _fullcopy, _copyfrom, _copyto)
>>> for of, fl in bothdiverge.items():
>>> if len(fl) == 2 and fl[0] == fl[1]:
>>> copy[fl[0]] = of # not actually divergent, just matching
>>> renames @@ -438,7 +460,7 @@
>>> (d, dirmove[d]))
>>>
>>> # check unaccounted nonoverlapping files against directory moves
>>> - for f in u1 + u2:
>>> + for f in unmatched:
>>> if f not in fullcopy:
>>> for d in dirmove:
>>> if f.startswith(d):
>>> @@ -452,7 +474,8 @@
>>>
>>> return copy, movewithdir, diverge, renamedelete
>>>
>>> -def checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy):
>>> +def checkcopies(ctx, f, m1, m2, ca, cta, limit, diverge, copy, fullcopy,
>>> + copyfrom, copyto):
>>> """
>>> check possible copies of f from m1 to m2
>>>
>>> @@ -460,14 +483,19 @@
>>> f = the filename to check
>>> m1 = the source manifest
>>> m2 = the destination manifest
>>> - ca = the changectx of the common ancestor
>>> + ca = the changectx of the common ancestor, overridden on graft
>>> + cta = topological common ancestor for graft-like scenarios
>>> limit = the rev number to not search beyond
>>> diverge = record all diverges in this dict
>>> copy = record all non-divergent copies in this dict
>>> fullcopy = record all copies in this dict
>>> + copyfrom = source sides of partially known copy tracks
>>> + copyto = destination sides of partially known copytracks
>>> """
>>>
>>> ma = ca.manifest()
>>> + mta = cta.manifest()
>>> + backwards = f in ma # graft common ancestor already contains the
>>> + rename
>>> getfctx = _makegetfctx(ctx)
>>>
>>> def _related(f1, f2, limit):
>>> @@ -513,20 +541,32 @@
>>> continue
>>> seen.add(of)
>>>
>>> - fullcopy[f] = of # remember for dir rename detection
>>> + # remember for dir rename detection
>>> + if backwards:
>>> + fullcopy[of] = f # grafting backwards through renames
>>> + else:
>>> + fullcopy[f] = of
>>> if of not in m2:
>>> continue # no match, keep looking
>>> if m2[of] == ma.get(of):
>>> break # no merge needed, quit early
>>> c2 = getfctx(of, m2[of])
>>> - cr = _related(oc, c2, ca.rev())
>>> + cr = _related(oc, c2, cta.rev())
>>> if cr and (of == f or of == c2.path()): # non-divergent
>>> - copy[f] = of
>>> + if backwards:
>>> + copy[of] = f
>>> + else:
>>> + copy[f] = of
>>> of = None
>>> break
>>>
>>> if of in ma:
>>> diverge.setdefault(of, []).append(f)
>>> + elif of in mta:
>>> + if backwards:
>>> + copyfrom[of] = f
>>> + else:
>>> + copyto[of] = f
>>>
>>> def duplicatecopies(repo, rev, fromrev, skiprev=None):
>>> '''reproduce copies from fromrev to rev in the dirstate diff
>>> --git a/mercurial/merge.py b/mercurial/merge.py
>>> --- a/mercurial/merge.py
>>> +++ b/mercurial/merge.py
>>> @@ -778,11 +778,12 @@
>>> This is currently not implemented -- it's an extension point."""
>>> return True
>>>
>>> -def manifestmerge(repo, wctx, p2, pa, branchmerge, force, matcher,
>>> +def manifestmerge(repo, wctx, p2, pa, pta, branchmerge, force,
>>> +matcher,
>>> acceptremote, followcopies):
>>
>> cf comment at the start of the file about keeping the signature untouched in
>> the common case.
>
> Will do. However, it will require pa and pta to be non-adjacent, which other reviewers may find unacceptable.
In my opinion, not making the API too complicated in the simple case
(and not breaking all existing code as a bonus) is more valuable here.
Other are free to jump in if they disagree.
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list