Network performance problems when pulling and cloning from HTTP server

Angel Ezquerra ezquerra at gmail.com
Tue Nov 27 11:54:51 UTC 2012


On Tue, Nov 27, 2012 at 12:24 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Thu, 2012-11-22 at 14:03 +0100, Angel Ezquerra wrote:
>> On Wed, Nov 21, 2012 at 10:24 PM, Matt Mackall <mpm at selenic.com> wrote:
>> > On Wed, 2012-11-21 at 08:02 +0100, Angel Ezquerra wrote:
>> >> On Nov 20, 2012 6:55 PM, "Bryan O'Sullivan" <bos at serpentine.com> wrote:
>> >> >
>> >> > On Tue, Nov 20, 2012 at 8:30 AM, Angel Ezquerra <ezquerra at gmail.com> wrote:
>> >> >>
>> >> >> one of my users has a repository with plenty of "big files" (in the
>> >> >> order of 50 to 100 MB). Our server is _not_ using the largefiles
>> >> >> extension (at least not yet), that is the files are "big" but are not
>> >> >> "largefiles".
>> >> >>
>> >> >> The total repository working directory size is about 1.5 GB. The big
>> >> >> files do not change often, if ever.
>> >>
>> >> Bryan, thanks a lot for your comments.
>> >>
>> >> > I'm not surprised that a normal clone would be slow in this situation. Assuming your large files never change, I *am* surprised that subsequent pulls would be slow.
>> >> >
>> >>
>> >> Sorry, that is not what I meant. The revision that adds these big
>> >> files is the last one. What I meant is that a pull that gets that last
>> >> revision will be slow. I expect future pulls of revisions that do not
>> >> add more big files to be fast again.
>> >>
>> >> What was initially surprising is that incoming is also very slow. I
>> >> always thought of incoming as a way to exchange hashes,
>> >
>> > ..and user names and dates and descriptions and file lists and sometimes
>> > diffs. In other words, everything.
>> >
>> > 'hg incoming' pulls a bundle, just like pull. It is in fact the one and
>> > only way to exchange this data supported by the wire protocol. The
>> > bundle starts with a changelog. If we only need the changelog (ie no
>> > diffs), then we abort the transfer in the middle.
>>
>> Matt, thanks for pitching in.
>>
>> I understand now the need to get the hashes, usernames, dates and
>> descriptions. As for also getting the diffs, that would only happen
>> when using the --bundle option, right? That is, calling a plain "hg
>> incoming" (without --bundle) should be pretty fast regardless of the
>> contents of the changesets themselves. Is that correct?
>
> Consider 'hg incoming -p'.
>
> In theory, Mercurial could lazily read the incoming bundle and only read
> past the changelog portion if we actually needed that data, but it turns
> out we don't actually do this yet. So incoming is currently exactly as
> expensive as pull.

I see. Thanks for the explanation.

TortoiseHg always saves incoming bundle, and then uses it if you do a
pull after incoming. Thus in that case the cost is not very high.
However, what do bare mercurial users do?

I guess one solution would be to create an alias that saved the
incoming bundle into some "last.bundle" file and then another alias
which would pull from "last.bundle"... Is there any other good
solution?

Cheers,

Angel



More information about the Mercurial mailing list