Network performance problems when pulling and cloning from HTTP server

Angel Ezquerra ezquerra at gmail.com
Thu Nov 22 13:03:22 UTC 2012


On Wed, Nov 21, 2012 at 10:24 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Wed, 2012-11-21 at 08:02 +0100, Angel Ezquerra wrote:
>> On Nov 20, 2012 6:55 PM, "Bryan O'Sullivan" <bos at serpentine.com> wrote:
>> >
>> > On Tue, Nov 20, 2012 at 8:30 AM, Angel Ezquerra <ezquerra at gmail.com> wrote:
>> >>
>> >> one of my users has a repository with plenty of "big files" (in the
>> >> order of 50 to 100 MB). Our server is _not_ using the largefiles
>> >> extension (at least not yet), that is the files are "big" but are not
>> >> "largefiles".
>> >>
>> >> The total repository working directory size is about 1.5 GB. The big
>> >> files do not change often, if ever.
>>
>> Bryan, thanks a lot for your comments.
>>
>> > I'm not surprised that a normal clone would be slow in this situation. Assuming your large files never change, I *am* surprised that subsequent pulls would be slow.
>> >
>>
>> Sorry, that is not what I meant. The revision that adds these big
>> files is the last one. What I meant is that a pull that gets that last
>> revision will be slow. I expect future pulls of revisions that do not
>> add more big files to be fast again.
>>
>> What was initially surprising is that incoming is also very slow. I
>> always thought of incoming as a way to exchange hashes,
>
> ..and user names and dates and descriptions and file lists and sometimes
> diffs. In other words, everything.
>
> 'hg incoming' pulls a bundle, just like pull. It is in fact the one and
> only way to exchange this data supported by the wire protocol. The
> bundle starts with a changelog. If we only need the changelog (ie no
> diffs), then we abort the transfer in the middle.

Matt, thanks for pitching in.

I understand now the need to get the hashes, usernames, dates and
descriptions. As for also getting the diffs, that would only happen
when using the --bundle option, right? That is, calling a plain "hg
incoming" (without --bundle) should be pretty fast regardless of the
contents of the changesets themselves. Is that correct?

>> In any case, may I ask why you are not surprised by this?
>
> You've hit the trifecta of ways to have suboptimal performance:
>
> a) Windows
> b) large files
> c) using a protocol designed for broadband and slower on a LAN

By protocol you mean the underlying http or mercurial's own protocol?

If you refer to http, I really don't know much about it but I did the
following test:

I set the "web.allow_archive" setting to "zip" and then I downloaded
the tip of my test repository, which contains a couple of big files
(with sizes of 46 MB and 92 MB) as a zip file. This was not nearly as
fast as a "clone uncompressed" but it went way faster than a regular
pull, with an average download rate of 25 Mbps (i.e. 25% of my 100
Mbps client bandwidth) with peaks of up to 60%. It was a bit
inconsistent, with several peaks and small stalls, but the download
was reasonably fast.

The size of the downladed zip file was 136 MB. I cannot say whether it
makes sense to compare this result to the bandwidth used during pull.
Yet my naive interpretation of this result is that the mercurial
windows web server is able to transfer a large file at a reasonable
speed, when the mercurial wire protocol is not involved (10x the speed
of a pull!).

>> > Do you have the ability to serve the repo that shows poor performance using "hg serve"? If so, it would be helpful to "hg serve --profile", do a pull from a client, then stop the server and share the profile dump.
>> C:\mercurial_tests\tmp_bigfiles_not_large>hg serve --port 7000
>> --profile --verbose
>
> ..no profile. A profile looks like this:
>
> $ hg serve --profile
> listening at http://calx:8000/ (bound to *:8000)
> [do something]
> [hit ctrl-c]
>    CallCount    Recursive     Total(s)    Inline(s) module:lineno(function)
>            4            0      1.5059      1.5058   <select.select>
>           57           46      0.0052      0.0036   <__import__>
>            1            0      0.0025      0.0013   mimetypes:205(readfp)
>          684            0      0.0006      0.0005   mercurial.config:20(__setitem__)
>          665            0      0.0007      0.0005   mimetypes:78(add_type)
>            2            0      0.0004      0.0004   <_socket.gethostbyaddr>
>           80            0      0.0008      0.0003   mercurial.config:27(update)
>          971            0      0.0003      0.0003   <method 'split' of 'str' objects>
>

Sorry, I did not know that.

Unfortunately, I cannot get --profile to work. Whenever I use Ctrl+C
to stop the server it just exits without printing any profile
information.

I am running this test server on our production mercurial (windows
2003) server, which is already using the default port. If I do:

hg server --profile

without specifying a port, the server exists immediately, saying
"abort: cannot start server at ':8000'", and in that case it _does_
print some profile information. However, it does not print anything if
I set a port to something other than 8000 and then I use Ctrl+C.

I thought that perhaps --profile did not work when the --port is set.
To verify this I stopped our production server tonight and tried again
to run "hg server --profile" (without setting the port). The result is
the same (i.e. no profile information is printed when I do Ctrl+C). I
tried killing the hg process using the windows task manager but I get
the same result (no profile printed).

What else could I do?

Thank you for your help. I really appreciate it.

Angel



More information about the Mercurial mailing list