SV: Slow push of large file over HTTP

Michael Tjørnemark mtj at pfa.dk
Thu Apr 26 11:51:17 UTC 2012


>Michael Tjørnemark <mtj at pfa.dk> writes:
>
>Hejsa :)
>
>> I have a repository with a single changeset which adds a single 60 MB 
>> file (zip-file). Pushing this repo over HTTP is much, much slower than 
>> other commands on the repository, including a similar pull - is this 
>> to be expected? I have recreated the problem on other machines and 
>> files as well, so it seems to be a general problem with pushing a
>> large(ish) file.
>>
>> Times (all on my local machine):
>> Commit file - 4 secs
>> Push to empty repo using filesystem - 7 secs Clone from repo over HTTP 
>> - 23 secs Pull to empty repo over HTTP - 23 secs Push to empty repo 
>> over HTTP - 4 mins <-- SLOW
>>
>> Command to serve empty repo:
>> hg serve --config web.allow_push=* --config web.push_ssl=false
>>
>> Command to push to empty repo:
>> hg push http://localhost:8000/ --debug --time
>>
>> In the debug output (see below), bundling and sending takes around 10 
>> secs. Then there is a almost 4 min pause between "sending:
>> 60342/120684 kb (50.00%)" and "remote: adding changesets". (Also it 
>> seems wrong that sending only goes to 50%, but that is another 
>> problem).
>
>You've stumpled upon a weird corner case in Python's HTTP library. If the server asks for authentication, then we'll only see this *after* pushing the entire changegroup to the server! We'll then have to start over. The progress code anticipates this and claims that you need to send 120 MB for the 60 MB push so that the progress bar will go smoothly from 0% to 100%. Here the push is really finished when it reaches 50%.
>

Yeah, I'm not too worried about that, but thanks for the explanation.

>I'm unsure why it then stops for 4 minuttes -- I would not expect that.
>
>> I understand that the largefiles extension might help, but this 
>> requires that everybody that uses the repo enables the extension, so i 
>> would rather avoid that. And also everything else is fast (commit, 
>> clone, pull), so it seems as if something is wrong with push over 
>> HTTP.
>
>Versioning zip files is... unusual :) Every revision of the zip file will take up a lot of new space since it can't be delta compressed much against the previous version. So after 10 edits to the file, you could end up with a repo with maybe 400 MB of history for that single file.
>
>The largefiles extension sound like just what you need -- Unity is actually is using it for versioning a lot of zip files.

Yes, this is not really what i am trying to do, but was just a simple way to recreate the core problem with push of large files over HTTP. The real usecase that started my investigation was initializing a new repository with around 6000 files, 150 MB total and of varying size - the largest 33 MB, and about 15 of them > 1 MB - which took more than 15 minutes to push (on a slow machine).

I have now recreated the problem with 5 xml files of 16 MB each (a more reasonable example than a single zip), and pulling over HTTP takes 8 secs while pushing takes around 1 minute - so as in the original example a factor of almost 1:10 of pull to push times. A test of a repository with 4500 small files (7,5 MB total) takes around 9 secs for both pull and push, so it seems to be a problem with large files.

We can live with the current performance since it only has to be done once and we are used to slow ClearCase performance, so I don't need a definitive answer. I just think that something is wrong when push is that much slower than pull - I would think the times should be comparable, so 10 times slower just seems wierd. It should be easy to recreate the problem anywhere (I have only tried it on Windows though) by serving an empty repository, and pushing a single changeset to it with one/more large files added, and compare that time with a pull from the same repository.



More information about the Mercurial mailing list