Network performance problems when pulling and cloning from HTTP server

Matt Mackall mpm at selenic.com
Mon Nov 26 23:24:54 UTC 2012


On Thu, 2012-11-22 at 14:03 +0100, Angel Ezquerra wrote:
> On Wed, Nov 21, 2012 at 10:24 PM, Matt Mackall <mpm at selenic.com> wrote:
> > On Wed, 2012-11-21 at 08:02 +0100, Angel Ezquerra wrote:
> >> On Nov 20, 2012 6:55 PM, "Bryan O'Sullivan" <bos at serpentine.com> wrote:
> >> >
> >> > On Tue, Nov 20, 2012 at 8:30 AM, Angel Ezquerra <ezquerra at gmail.com> wrote:
> >> >>
> >> >> one of my users has a repository with plenty of "big files" (in the
> >> >> order of 50 to 100 MB). Our server is _not_ using the largefiles
> >> >> extension (at least not yet), that is the files are "big" but are not
> >> >> "largefiles".
> >> >>
> >> >> The total repository working directory size is about 1.5 GB. The big
> >> >> files do not change often, if ever.
> >>
> >> Bryan, thanks a lot for your comments.
> >>
> >> > I'm not surprised that a normal clone would be slow in this situation. Assuming your large files never change, I *am* surprised that subsequent pulls would be slow.
> >> >
> >>
> >> Sorry, that is not what I meant. The revision that adds these big
> >> files is the last one. What I meant is that a pull that gets that last
> >> revision will be slow. I expect future pulls of revisions that do not
> >> add more big files to be fast again.
> >>
> >> What was initially surprising is that incoming is also very slow. I
> >> always thought of incoming as a way to exchange hashes,
> >
> > ..and user names and dates and descriptions and file lists and sometimes
> > diffs. In other words, everything.
> >
> > 'hg incoming' pulls a bundle, just like pull. It is in fact the one and
> > only way to exchange this data supported by the wire protocol. The
> > bundle starts with a changelog. If we only need the changelog (ie no
> > diffs), then we abort the transfer in the middle.
> 
> Matt, thanks for pitching in.
> 
> I understand now the need to get the hashes, usernames, dates and
> descriptions. As for also getting the diffs, that would only happen
> when using the --bundle option, right? That is, calling a plain "hg
> incoming" (without --bundle) should be pretty fast regardless of the
> contents of the changesets themselves. Is that correct?

Consider 'hg incoming -p'.

In theory, Mercurial could lazily read the incoming bundle and only read
past the changelog portion if we actually needed that data, but it turns
out we don't actually do this yet. So incoming is currently exactly as
expensive as pull.

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial mailing list