[PATCH 01 of 14 V3] util: move 'readexactly' in the util module
Boris Feld
boris.feld at octobus.net
Tue Jan 30 22:00:45 UTC 2018
After chatting on IRC with Gregory about this, here is a detailed list
of proposals:
- The `X-v2` bundlespec should be stable in time. An old Mercurial
client shouldn't choke on a bundle 2 if there is a new unknown part.
Adding the phases and obsolescence parts using experimental config
knobs violate this guarantee.=> Proposal: Bundle announced with `X-v2`
bundlespec will always be bundle 2 without phases, bookmarks or
obsolescence parts. `hg bundle --type=v2` would generate such bundles.
- With the recent addition of new bundle 2 parts, we should introduce a
new bundle-spec which will imply that the bundle contains these recent
parts.=> Proposal: Introduce a new bundlespec format `X-v3`. It will be
for bundle 2 containing a changegroup part in addition to a bookmarks,
phase, and obsolescence parts. It will be generated with `hg bundle --
type=v3`.=> Proposal: As an edge-case, people may want to generate a v3
bundle without a specific part. A variant could be introduced, like
`zstd-v3;no-bookmarks`.
- The old stream clone bundle use the `none-packed1` bundlespec.
Instead of using a dedicated version, we should use the same bundle2
versions; v2 and v3. Also, it uses a dedicated debug command, we should
instead use the bundle command which is the best-suited to handle that
task.=> Proposal: Introduce a new bundlespec format `none-
v2;stream=v2;requirements..` for the stream clone v2. This will
guarantee a stream part but the bundle will not contain the bookmarks,
phase and obsolescence parts. It will be generated with `hg bundle --
type=none-v2;stream=v2`.=> Proposal: Introduce a new bundlespec format
`none-v3;stream=v2;requirements..` for the stream clone v2. This will
guarantee a stream part in addition to a bookmarks, phase, and
obsolescence parts. It will be generated with `hg bundle --type=none-
v3;stream=v2`.
Carrying these guarantees in the bundlespec will likely require some
changes in the `parsebundlespec` API as it would likely need to return
the `contentops` based on the bundlespec instead of computing the
contentops in the bundle command.
Refactoring the bundlespec parsing for the `none-v2;stream=v2;` could
be done before agreeing on the level of guarantees the `X-v3`
bundlespec gives, so a possible two-steps plan is:
- Refactor and add the possibility to generate stream v2 clone bundle
with the `none-v2;stream=v2;` bundlespec and the bundle command.-
Introduce the `X-v3` bundlespecs.
I think we should be able to target the 4.5 release for the first step,
I can have a series ready for tomorrow.
What do you think?
On Tue, 2018-01-30 at 08:35 -0800, Gregory Szorc wrote:
> On Tue, Jan 30, 2018 at 3:41 AM, Boris Feld <boris.feld at octobus.net>
> wrote:
> > On Mon, 2018-01-29 at 11:37 -0800, Gregory Szorc wrote:
> > > On Fri, Jan 26, 2018 at 8:39 AM, Boris Feld <boris.feld at octobus.n
> > > et> wrote:
> > > > On Sat, 2018-01-20 at 12:22 -0800, Gregory Szorc wrote:
> > > > > On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld at octob
> > > > > us.net> wrote:
> > > > > > # HG changeset patch
> > > > > >
> > > > > > # User Boris Feld <boris.feld at octobus.net>
> > > > > >
> > > > > > # Date 1516391495 -3600
> > > > > >
> > > > > > # Fri Jan 19 20:51:35 2018 +0100
> > > > > >
> > > > > > # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> > > > > >
> > > > > > # Parent de32acb24949c0e3633de373d1c6c8c814faa804
> > > > > >
> > > > > > # EXP-Topic b2-stream
> > > > > >
> > > > > > # Available At https://bitbucket.org/octobus/mercurial-deve
> > > > > > l/
> > > > > >
> > > > > > # hg pull https://bitbucket.org/octobus/mercur
> > > > > > ial-devel/ -r 15f7795f96a5
> > > > > >
> > > > > > util: move 'readexactly' in the util module
> > > > > >
> > > > >
> > > > > I was swamped the last few days and didn't have time to look
> > > > > at this series even though I really wanted to. I reviewed it
> > > > > this morning and am generally happy with it. However, I'd
> > > > > like to make some BC changes to the wire protocol so we don't
> > > > > ship things in the wire protocol that we'll need to support
> > > > > indefinitely. A number of other changes to make this a more
> > > > > useful feature can be done post RC, during freeze.
> > > > >
> > > > > *High priority issues* (should be addressed before RC)
> > > > >
> > > > > HTTP server is sending bundle2 response with compression.
> > > > > That's a significant reason why this format is slower than
> > > > > the old format. I also think bundle2 overhead is coming into
> > > > > play on the server. I need to profile once compression is
> > > > > removed from the equation. I'm pretty sure this change can be
> > > > > done during freeze since it requires the server to drop
> > > > > compression for this bundle2 response. But I'm not 100% about
> > > > > that given how compression is negotiated between client and
> > > > > server. Would prefer this change make it in before RC.
> > > > >
> > > > > The new "stream" bundle2 part has a "version" parameter. I
> > > > > don't like versioning bundle2 parts. I think the behavior of
> > > > > bundle2 parts should be consistent over time. If we need to
> > > > > introduce a new version of something or change processing
> > > > > semantics, we should be introducing a new bundle2 part and
> > > > > associate new/different behavior with the part. Attempting
> > > > > content negotiation within a named bundle2 part is extra
> > > > > complication and will lead to increased protocol chaos down
> > > > > the road. This should be changed before RC since it impacts
> > > > > the wire protocol.
> > > > >
> > > > > Requirements string in bundle2 part should ideally use the
> > > > > same pre-encoded normalization as bundlespecs for consistency
> > > > > with the rest of Mercurial. Since it touches the wire
> > > > > protocol, this needs to change before RC.
> > > > >
> > > > > *Medium priority issues* (ideally addressed before RC)
> > > > >
> > > > > I don't think there's an easy way to generate bundle2 bundles
> > > > > with
> > > > > stream parts. We need this to support clonebundles. This will
> > > > > likely
> > > > > require bundlespec changes. I would highly prefer to get this
> > > > > in before RC so we can do end-to-end integration testing with
> > > > > clonebundles.
> > > >
> > > > What about starting by adding a --v2 flag to the debug command
> > > > generating stream bundle?
> > >
> > > That /could/ work.
> > >
> > > The better path forward is to shoehorn all of this into "bundle
> > > specifications" (bundlespecs). The user-facing bundlespec string
> > > defines a set of capabilities and contents of a bundle that any
> > > two supporting implementations will be able to read and write.
> > > Right now, v1-* means changegroup 1 (the original bundle format)
> > > and v2-* means a subset of bundle2. With the addition of phases,
> > > bookmarks, etc, it is time to create a v3-* that allows the
> > > creation of these bundle2 parts. A "streaming clone bundle" could
> > > be a variation of v3-*. e.g. v3-none?stream=1.
> >
> > I've started working on the stream clone bundle v2, you can find
> > the first iterations of the changesets here: https://bitbucket.org/
> > octobus/mercurial-devel/commits/all?search=48e60e649700%3A%3A
> >
> > I start adding a v2 flag to generate a new stream bundle in: https:
> > //bitbucket.org/octobus/mercurial-
> > devel/commits/48e60e649700d51a7ae3b221c689ebbd2ec84371
> >
> > I later introduced a new bundle spec for the v2 stream clone bundle
> > `none-packed2` to mimic the stream clone bundle v1 bundle spec in h
> > ttps://bitbucket.org/octobus/mercurial-
> > devel/commits/4414466e4393e496c1cb4f066322b185a9da7bbf. One thing
> > that confused me is that v1 and v2 bundle spec seemd to imply that
> > there was a changegroup part in the bundle, but we don't need to
> > have it in the bundle, right?
> >
> > I'm not sure to see the implications of introducing a v3 bundle
> > spec, what pieces of code needs to change to handle it?
>
> A v3 bundlespec will require various changes in exchange.py.
> commands.py will need updates to `hg bundle`. Clonebundles code needs
> to be taught about it.
> But what is probably needed first is an overhaul of the code for
> conveying bundle "capabilities." IMO we want a mechanism to convert a
> bundlespec to set of bundle2 "client" capabilities. Then we can feed
> those into bundle generation functions. Local bundle2 generation
> would then look very similar to how bundles are generated for the
> wire protocol. It would be a good thing for those to share code
> paths.
>
> > >
> > > > > *Lesser priority issues* (can be fixed post RC)
> > > > >
> > > > > streamclone._emit() should be renamed to _emitv2() for
> > > > > consistency with other functions in file.
> > > >
> > > > (done)
> > > >
> > > >
> > > > > I think streamclone.applybundlev2() should handle writing
> > > > > requirements, not the bundle2 part handler. Similar logic is
> > > > > already in streamclone.maybeperformlegacystreamclone().
> > > >
> > > > Good point, done.
> > > >
> > > >
> > > > > I need to look at this in more detail, but 133a678673cb
> > > > > ("allow bundle'2 stream clone") is a bit suspect. It's
> > > > > correctness may depend on your interpretation of whether
> > > > > server.disablefullbundle applies only to changegroup bundles.
> > > > > I can't recall what existing behavior is with regards to
> > > > > stream clone. Need to double check.
> > > >
> > > > 133a678673cb exist because a test failed otherwise. Since there
> > > > is a test for it we assumed it should work. So if we revisit
> > > > this logic, we should do it for both stream v1 and stream v2.
> > > >
> > > > > test-clone-uncompressed.t wants a comment about legacy stream
> > > > > clone phases being buggy. We should also decide what we want
> > > > > to do about issue #5648 since bundle2 stream clone solves the
> > > > > problem. I'm tempted to mark #5648 as resolved once bundle2
> > > > > stream clone is enabled by default.
> > > >
> > > > +1 for marking it as solved once the protocol is no longer
> > > > experimental.
> > > >
> > > >
> > > > > In 56c30b31afbe:
> > > > > @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
> > > > > kwargs['cg'] = False
> > > > > kwargs['stream'] = True
> > > > > pullop.stepsdone.add('changegroup')
> > > > > + pullop.stepsdone.add('phases')
> > > > >
> > > > > Do we know for sure that phases were processed? It depends on
> > > > > what the server sent, no? I think this code may assume the
> > > > > behavior of the official Mercurial server and may not be
> > > > > compatible with other implementation.
> > > >
> > > > We are not sure what you mean here. If the repository is
> > > > publishing → we don't send any phases, making all transferred
> > > > changeset public. If server is non-publishing, we transfer the
> > > > same phase root as the original. This bypasses the usual
> > > > transfer for phases the same way we bypass the usual transfer
> > > > for changeset.
> > >
> > > What I mean is that this code is assuming that the exchanged
> > > bundle2 payload either communicated phases or didn't need to
> > > communicate phases. In other words, this assumes behavior of the
> > > canonical Mercurial client and server. A separate server
> > > implementation may not behave similarly. e.g. it may not
> > > implement exchange of phases via bundle2.
> > >
> > > > > The use of temporary files for non-append-only files and the
> > > > > context manager and copy primitive associated with that code
> > > > > feels overly complex. Can we just slurp file content into
> > > > > memory? We already store the filenames and offsets in memory.
> > > > > We already scale memory use linearly with number of files in
> > > > > the repo. I don't see a huge problem doing the same for files
> > > > > that are proportional to number of changesets, branches,
> > > > > tags, and heads. That being said, the code is already landed.
> > > > > Some maybe we just keep it around. If we do keep it around, I
> > > > > think it could use some refactoring to improve readability.
> > > > > The amount of functions being called is a bit much for what
> > > > > the code is doing, IMO.
> > > >
> > > > We cannot keep cache file in memory, we've seen extensions
> > > > where cache file can become huge.
> > >
> > > Bleh. It would be nice to avoid the overhead in the common case
> > > where the cache files are small. Or, we can put the temp files
> > > next to the final cache files so we can perform an atomic rename
> > > within the same directory.
> > >
> > > > > The code around managing multiple vfs instances feels a bit
> > > > > complex. I don't think we need a dict of vfs instances. We
> > > > > only have 2. Let's inline their usage and do away with all
> > > > > the context manager and dictionary lookup complications.
> > > > >
> > > > > Related to vfs, we're not using the "expectedcount" argument
> > > > > with vfs.backgroundclose(). Ideally, the part header defines
> > > > > the number of files in each vfs so we can pass the proper
> > > > > value. But, I think it's fine to use the total expected file
> > > > > count as the hint to the store vfs and don't use background
> > > > > close for the cache vfs (since it will almost certainly never
> > > > > have enough entries to warrant background closing).
> > > >
> > > > The number of filecount is used in debug message and at the
> > > > protocol level. So in all cases, we'll have to introduce a new
> > > > parameter to carry the data per vfs. We'll do that next cycle.
> > > >
> > > > > I'm pretty happy with the feature. It is currently behind an
> > > > > experimental config flag. I think we can make it enabled by
> > > > > default in 4.5 if we fix the more important issues.
> > > >
> > > > You mean turning it on for 4.5? We find it too aggressive,
> > > > let's ship it experimental in 4.5 and turn that on early in the
> > > > 4.6 cycle.
> > >
> > > At this point, it is definitely too late to be enabled for 4.5.
> > >
> > > > > *Other observations *
> > > > >
> > > > > The new format seems to use less CPU on the client than the
> > > > > old format when doing HTTP exchange. 30-31s versus 25-26s for
> > > > > mozilla-unified. And that's with zstd in play. I haven't
> > > > > profiled, but this is likely due to not using readline() and
> > > > > transferring caches and phases data (I believe we avoid a
> > > > > linear changelog scan post transfer now since I believe
> > > > > something before needed to reconstruct the branch cache).
> > > > >
> > > > > Here are some example numbers on my i7-6700K.
> > > > >
> > > > > legacy http
> > > > > server:
> > > > > 8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata
> > > > > 171184maxresident)k
> > > > > 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
> > > > >
> > > > > client:
> > > > > 30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata
> > > > > 197244maxresident)k
> > > > > 19288inputs+7039984outputs (2major+94838minor)pagefaults
> > > > > 0swaps
> > > > >
> > > > > v2 http
> > > > > server:
> > > > > 37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata
> > > > > 236808maxresident)k
> > > > > 6836840inputs+21288outputs (2major+82612minor)pagefaults
> > > > > 0swaps
> > > > >
> > > > > client:
> > > > > 25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata
> > > > > 209164maxresident)k
> > > > > 60848inputs+7061216outputs (2major+78863minor)pagefaults
> > > > > 0swaps
> > > > >
> > > > > >
> > > > > > This function is used in multiple place, having it in util
> > > > > > would be better.
> > > > > >
> > > > > > (existing caller will be migrated in another series)
> > > > > >
> > > > > >
> > > > > >
> > > > > > diff --git a/mercurial/changegroup.py
> > > > > > b/mercurial/changegroup.py
> > > > > >
> > > > > > --- a/mercurial/changegroup.py
> > > > > >
> > > > > > +++ b/mercurial/changegroup.py
> > > > > >
> > > > > > @@ -32,14 +32,7 @@ from . import (
> > > > > >
> > > > > > _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
> > > > > >
> > > > > > _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
> > > > > >
> > > > > >
> > > > > >
> > > > > > -def readexactly(stream, n):
> > > > > >
> > > > > > - '''read n bytes from stream.read and abort if less was
> > > > > > available'''
> > > > > >
> > > > > > - s = stream.read(n)
> > > > > >
> > > > > > - if len(s) < n:
> > > > > >
> > > > > > - raise error.Abort(_("stream ended unexpectedly"
> > > > > >
> > > > > > - " (got %d bytes, expected %d)")
> > > > > >
> > > > > > - % (len(s), n))
> > > > > >
> > > > > > - return s
> > > > > >
> > > > > > +readexactly = util.readexactly
> > > > > >
> > > > > >
> > > > > >
> > > > > > def getchunk(stream):
> > > > > >
> > > > > > """return the next chunk from stream as a string"""
> > > > > >
> > > > > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > > > >
> > > > > > --- a/mercurial/util.py
> > > > > >
> > > > > > +++ b/mercurial/util.py
> > > > > >
> > > > > > @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx,
> > > > > > others=None):
> > > > > >
> > > > > > fn = '%s~%s~%s' % (f, tag, n)
> > > > > >
> > > > > > if fn not in ctx and fn not in others:
> > > > > >
> > > > > > return fn
> > > > > >
> > > > > > +
> > > > > >
> > > > > > +def readexactly(stream, n):
> > > > > >
> > > > > > + '''read n bytes from stream.read and abort if less was
> > > > > > available'''
> > > > > >
> > > > > > + s = stream.read(n)
> > > > > >
> > > > > > + if len(s) < n:
> > > > > >
> > > > > > + raise error.Abort(_("stream ended unexpectedly"
> > > > > >
> > > > > > + " (got %d bytes, expected %d)")
> > > > > >
> > > > > > + % (len(s), n))
> > > > > >
> > > > > > + return s
> > > > > >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180130/94a8fedd/attachment-0002.html>
More information about the Mercurial-devel
mailing list