Solaris 11.4 hosted repository, TortoiseHG clone attempt consumes all resources
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Mon Jul 20 22:28:29 UTC 2020
On 6/23/20 11:15 PM, Scott Newman - NOAA Affiliate wrote:
>>>>>>>>>>> Good morning everyone!
>>>>>>>>>>>
>>>>>>>>>>> We are currently using Mercurial 5.2.2 hosted on Solaris 11.3 and
>>>>>>>>>>> accessed
>>>>>>>>>>> by contributors via TortoiseHG 5.0.2 from their Windows Desktops.
>>>>>>>>>>> We are
>>>>>>>>>>> in the process of migrating applications to new hosts running
>>>>>>>>>>> Solaris
>>>>>>>>>>> 11.4.
>>>>>>>>>>
>>>>>>>>>> As far as I understand, you use the same versions (Mercurial 5.2.2
>>>>>>>>>> on
>>>>>>>>>> server TortoiseHG 5.0.2 on client) and the same python (probably
>>>>>>>>>> 2.7
>>>>>>>>>> something?) The only software version difference is Solaris 11.3 vs
>>>>>>>>>> Solaris 11.4, right ?
>>>>>>>>>
>>>>>>>>> Pierre-Yves, so nice to hear from you! Correct. Python 2.7.18
>>>>>>>>> (tried
>>>>>>>>> some others with the same result). I have an update that when we
>>>>>>>>> tried going back to THG 3.4 the clone worked as expected, but that
>>>>>>>>> doesn't seem like a good long-term solution, particularly since we
>>>>>>>>> will lose the ability to export-archive that was introduced
>>>>>>>>> somewhere
>>>>>>>>> around version 4.5, if you recall.
>>>>>>>>
>>>>>>>> That is very interesting, We are talking about using THG 3.4 on the
>>>>>>>> client right? with still using Mercurial 5.2.2 on the server, right?
>>>>>>>
>>>>>>> Correct. It is so interesting that the client can have such an impact
>>>>>>> on the server!
>>>>>>>
>>>>>>>>
>>>>>>>> If so, this means using a new protocol feature introduced in betwen
>>>>>>>> 3.4
>>>>>>>> and 5.2 reveal the issue.
>>>>>>>>
>>>>>>>> Can you confirm this? And if so, can you try to find the exact
>>>>>>>> Mercurial
>>>>>>>> version client side that trigger this issue?
>>>>>>>
>>>>>>> I am scheduled to work on this with another resource tomorrow at 15:00
>>>>>>> EST and will update this thread. We have confirmed that the problem
>>>>>>> exists in THG4.5.0, so it will be somewhere in between 3.4 and 4.5.0.
>>>>>>>
>>>>>>>>
>>>>>>>> However, the export-archive thingy is something you run server side,
>>>>>>>> don't you?
>>>>>>>>
>>>>>>>
>>>>>>> We perform this task on the client side now with the archive function
>>>>>>> and have abandoned the customization in favor of the built-in archive
>>>>>>> functionality added
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> When trying to clone a copy of the repository hosted on
>>>>>>>>>>> Solaris
>>>>>>>>>>> 11.4 the clone runs very slowly and the process consumes most of
>>>>>>>>>>> the
>>>>>>>>>>> memory (64GB) on the host, starts generating "-bash: fork:
>>>>>>>>>>> Resource
>>>>>>>>>>> temporarily unavailable" errors for users on the box after about 2
>>>>>>>>>>> minutes, and the clone process fails with a " Server Unexpectedly
>>>>>>>>>>> closed
>>>>>>>>>>> connection" message.
>>>>>>>>>>
>>>>>>>>>> So, the serveur hosting the repository is crumbly while cloning
>>>>>>>>>> right?
>>>>>>>>>> how are you cloning ? ssh or http ?
>>>>>>>>>
>>>>>>>>> Cloning via ssh.
>>>>>>>>
>>>>>>>> Great, can you add:
>>>>>>>>
>>>>>>>> [ui]
>>>>>>>> debug=yes
>>>>>>>>
>>>>>>>> In the HGRC of the remote repository and run a clone, this you give
>>>>>>>> you
>>>>>>>> a tons of remote output that might help to understand what is going
>>>>>>>> on
>>>>>>>> when the memory explode.
>>>>>>>
>>>>>>> Here is the result on the client BEFORE adding debug:
>>>>>>> % hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
>>>>>>> "C:\Repos\test"
>>>>>>> requesting all changes
>>>>>>> adding changesets
>>>>>>> adding manifests
>>>>>>> adding file changes ### Processes 123/5396 files, takes 10-15
>>>>>>> minutes, fails here
>>>>>>> transaction abort!
>>>>>>> rollback completed
>>>>>>> abort: stream ended unexpectedly (got 20593 bytes, expected 32768)
>>>>>>> [command returned code 255 Mon Jun 22 15:31:39 2020]
>>>>>>>
>>>>>>> When I add the debug entry it stalls at:
>>>>>>> % hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
>>>>>>> "C:\Repos\test"
>>>>>>> requesting all changes ### stalls here
>>>>>>>
>>>>>>>>
>>>>>>>>>>> The same process on Solaris 11.3 has a negligible
>>>>>>>>>>> impact on resources and finishes in about 10 minutes.
>>>>>>>>>>>
>>>>>>>>>>> I have spent several days with the Network and Systems
>>>>>>>>>>> Administrators
>>>>>>>>>>> trying to resolve this issue without success. We tried many
>>>>>>>>>>> things,
>>>>>>>>>>> including adjusting resource configurations, rebuilding Mercurial
>>>>>>>>>>> and
>>>>>>>>>>> Python, using Mercurial and Python from the working server, using
>>>>>>>>>>> the
>>>>>>>>>>> pre-built package from Oracle (v4.9.1),
>>>>>>>>>>
>>>>>>>>>> How did you transfer the repository between the two servers?
>>>>>>>>>
>>>>>>>>> I used hg clone (via ssh) between the servers without issue.
>>>>>>>>
>>>>>>>> This clone might have upgraded the repository to newer format, and
>>>>>>>> jumped on an unknown issue affecting you repository. what does `hg
>>>>>>>> debugformat` says on the older server?
>>>>>>>
>>>>>>> On older server:
>>>>>>> format-variant repo
>>>>>>> fncache: yes
>>>>>>> dotencode: yes
>>>>>>> generaldelta: yes
>>>>>>> sparserevlog: no
>>>>>>> sidedata: no
>>>>>>> copies-sdc: no
>>>>>>> plain-cl-delta: no
>>>>>>> compression: zlib
>>>>>>> compression-level: default
>>>>>>
>>>>>> Okay, so the most notable difference is `sparserevlog`. You might
>>>>>> encounter some unknown pathologilab. Can you try making a new server
>>>>>> clone using `--config format.sparse-revlog=no` during the clone ?
>>>>>>
>>>>>
>>>>> I created a new server clone using:
>>>>> hg clone --config format.sparse-revlog=no --noupdate
>>>>> ssh://<username>@<hostname>/<SRCreponame> <TARGETreponame>
>>>>> When I tried to clone with THG 5.0.2 via the UI I saw the same behavior.
>>>>> When I performed the clone via the console using:
>>>>> hg clone --config format.sparse-revlog=no --verbose
>>>>> ssh://<username>@<hostname>/<SRCreponame> "<TARGETreponame"
>>>>> I saw the same behavior.
>>>>
>>>> You are cloning from the Solaris 11.3 machine into the solaris 11.4
>>>> machine right ? can you double check the `hg debugformat` of the
>>>> resulting clone ?
>>>>
>>>
>>> Correct, I cloned to 11.4 machine from 11.3 machine, then tried to
>>> clone to Windows machine from 11.4 machine using THG 5.0.2. Here are
>>> the hg debugformat results:
>>> format-variant repo
>>> fncache: yes
>>> dotencode: yes
>>> generaldelta: yes
>>> sparserevlog: no
>>> sidedata: no
>>> copies-sdc: no
>>> plain-cl-delta: yes
>>> compression: zlib
>>> compression-level: default
>>>
>>> Scott
>>
>> Okay, so this is not the source of the issue.
>> What happens if you copy the repository from one server to the other (no
>> clone, just `scp -r` ?
>>
>
> I created a copy of the repo from the 11.3 machine to the 11.4 machine
> and tried to clone using THG5.0.2 and saw the same bad behavior.
>
> Further to the request above to find the exact THG version the issue
> started with: We did not have the issue with THG4.3.1, but did have
> it in THG 4.4.1.
Okay, so the problem appears with something that change betwen 4.3.1 and
4.4.1 on the client side. Yet impacting the server side during pull.
protocol wise very few changed between 4.3 and 4.4 (only the head based
phase computation), so this is a bit puzzling. However they was a lot a
of churn around the changegroup (that serialise change to repository and
file over the wire) in that period, maybe something faulty slipped in
that ? However this is odd that it would affect the server side.
Can you do the test with copying the directory manually between the too
machine (instead of cloning) to see if the problem still appears.
Once this test is done, I fear there will be little that can be done
without further inspection of the repository itself.
Regards,
--
Pierre-Yves David
More information about the Mercurial
mailing list