[PATCH 5 of 5] exchange: refactor APIs to obtain bundle data (API)
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Wed Sep 28 12:11:38 UTC 2016
On 09/27/2016 06:12 PM, Gregory Szorc wrote:
> On Tue, Sep 27, 2016 at 6:45 AM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>
>
>
> On 09/25/2016 10:42 PM, Gregory Szorc wrote:
>
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com
> <mailto:gregory.szorc at gmail.com>>
> # Date 1474830761 25200
> # Sun Sep 25 12:12:41 2016 -0700
> # Node ID f6bb4ff0a8c47b099157e615a2874c193ac77512
> # Parent 2dc93677a8836d2365b561d1ba79f62ff68a4f23
> exchange: refactor APIs to obtain bundle data (API)
>
> Currently, exchange.getbundle() returns either a cg1unpacker or a
> util.chunkbuffer (in the case of bundle2). This is kinda OK, as
> both expose a .read() to consumers. However, localpeer.getbundle()
> has code inferring what the response type is based on arguments and
> converts the util.chunkbuffer returned in the bundle2 case to a
> bundle2.unbundle20 instance. This is a sign that the API for
> exchange.getbundle() is not ideal because it doesn't consistently
> return an "unbundler" instance.
>
>
> I wonder how much this is used in extension. Should we keep the old
> API with a deprecation warning for a version ?
>
> In addition, unbundlers mask the fact that there is an underlying
> generator of changegroup data. In both cg1 and bundle2, this
> generator
> is being fed into a util.chunkbuffer so it can be re-exposed as a
> file object.
>
> util.chunkbuffer is a nice abstraction. However, it should only be
> used "at the edges." This is because keeping data as a generator is
> more efficient than converting it to a chunkbuffer, especially if we
> convert that chunkbuffer back to a generator (as is the case in some
> code paths currently).
>
>
> I know that the chunkbuffer+groupchunk is not removed yet. But can
> you remind up of the expected performance gain when the refactoring
> is done doing?
>
>
> I think we could see a 5-10% CPU reduction on the server when everything
> is done.
That's great, can you includes this projection into the changeset
description?
> This patch splits exchange.getbundle() into 2 functions. 1 returns
> an iterator over raw chunks (along with a flag saying whether it is
> bundle2). The other returns an "unbundler" instance.
>
>
> Given how few call site we have, I wonder if we really need 2
> functions. Could we simply move to getbundlechunks? And have the
> unbundler logic in the one place where we needs it.
>
> There seems to already be "I'm requesting" a bundle2 logic here so
> inlining this would be fine. We need to perform such logic in the
> client in all case because of the remote (ssh/http) anyway.
>
> This would allow us to remove the 'isbundle2' flag from
> getbundlechunks too. That flags seems a bit awkward to me.
>
>
> The APIs around cg1 vs bundle2 are wonky and often require the client to
> know what is requesting before the fact. In the ideal world callers
> would pass "bundlecaps" or some such and get a generic type/interface
> back. There is a lot of follow-up improvement that could be made.
From this patch, there is only on spot that actually use this
information. The 'localpeer.getbundle' method. This method is the
equivalent of the 'wirepeer.getbundle' method. The 'wirepeer' version
will never be able to access this 'isbundle2' boolean. Because it
communicate over the wire to a frozen API. As a result, this 'wirepeer'
version already have logic to handle 'cg1' vs 'bundle2' and will have
too keep it forever. With this in mind, I think is make sense to
simplify the main API and keep the "complexity" in the one caller that
needs, it. We could even factorize the detection of bundle2 between
'localpeer' and 'wirepeer' and get similar benefit.
All the other caller just care about getting chunk and forwarding them.
Not using this boolean. I think having a simpler return type (chunks
instead of tuple) would be better.
> I didn't want to scope bloat to fix all the APIs. I think this refactor is
> a strict improvement because it provides a facility for accessing the
> low-level generator, which we can now use in performance sensitive
> applications.
This refactor is a good small step. Dropping that flag would make it
even smaller as you are removing existing logic about bundle2 detection.
> Callers of exchange.getbundle() have been updated to use the
> appropriate new API.
>
> There is a minor change of behavior in test-getbundle.t. This is
> because `hg debuggetbundle` isn't defining bundlecaps. As a result,
> a cg1 data stream and unpacker is being produced. This is
> getting fed
> into a new bundle20 instance via bundle2.writebundle(), which uses
> a backchannel mechanism between changegroup generation to add the
> "nbchanges" part parameter. I never liked this backchannel mechanism
> and I plan to remove it someday. `hg bundle` still produces the
> "nbchanges" part parameter, so there should be no user-visible
> change of behavior. I consider this "regression" a bug in
> `hg debuggetbundle`. And that bug is captured by an existing
> "TODO" in the code to use bundle2 capabilities.
>
>
> How hard would it be to fix this 'debuggetbundle' thingy beforehand?
>
>
> debuggetbundle is already incomplete. I didn't want to scope bloat this
> work to fix an existing, unrelated deficiency. I'm not sure how much
> work fixing debuggetbundle would be.
Okay.
At some point one should probably look into debuggetbundle and either
makes is closer to reality or get ride of it. But I agree this is
outside of this series scope.
Cheers,
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list