Solaris 11.4 hosted repository, TortoiseHG clone attempt consumes all resources
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Tue Jun 23 20:14:33 UTC 2020
On 6/23/20 9:51 PM, Scott Newman - NOAA Affiliate wrote:
>>>>>>> Good morning everyone!
>>>>>>>
>>>>>>> We are currently using Mercurial 5.2.2 hosted on Solaris 11.3 and accessed
>>>>>>> by contributors via TortoiseHG 5.0.2 from their Windows Desktops. We are
>>>>>>> in the process of migrating applications to new hosts running Solaris
>>>>>>> 11.4.
>>>>>>
>>>>>> As far as I understand, you use the same versions (Mercurial 5.2.2 on
>>>>>> server TortoiseHG 5.0.2 on client) and the same python (probably 2.7
>>>>>> something?) The only software version difference is Solaris 11.3 vs
>>>>>> Solaris 11.4, right ?
>>>>>
>>>>> Pierre-Yves, so nice to hear from you! Correct. Python 2.7.18 (tried
>>>>> some others with the same result). I have an update that when we
>>>>> tried going back to THG 3.4 the clone worked as expected, but that
>>>>> doesn't seem like a good long-term solution, particularly since we
>>>>> will lose the ability to export-archive that was introduced somewhere
>>>>> around version 4.5, if you recall.
>>>>
>>>> That is very interesting, We are talking about using THG 3.4 on the
>>>> client right? with still using Mercurial 5.2.2 on the server, right?
>>>
>>> Correct. It is so interesting that the client can have such an impact
>>> on the server!
>>>
>>>>
>>>> If so, this means using a new protocol feature introduced in betwen 3.4
>>>> and 5.2 reveal the issue.
>>>>
>>>> Can you confirm this? And if so, can you try to find the exact Mercurial
>>>> version client side that trigger this issue?
>>>
>>> I am scheduled to work on this with another resource tomorrow at 15:00
>>> EST and will update this thread. We have confirmed that the problem
>>> exists in THG4.5.0, so it will be somewhere in between 3.4 and 4.5.0.
>>>
>>>>
>>>> However, the export-archive thingy is something you run server side,
>>>> don't you?
>>>>
>>>
>>> We perform this task on the client side now with the archive function
>>> and have abandoned the customization in favor of the built-in archive
>>> functionality added
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> When trying to clone a copy of the repository hosted on Solaris
>>>>>>> 11.4 the clone runs very slowly and the process consumes most of the
>>>>>>> memory (64GB) on the host, starts generating "-bash: fork: Resource
>>>>>>> temporarily unavailable" errors for users on the box after about 2
>>>>>>> minutes, and the clone process fails with a " Server Unexpectedly closed
>>>>>>> connection" message.
>>>>>>
>>>>>> So, the serveur hosting the repository is crumbly while cloning right?
>>>>>> how are you cloning ? ssh or http ?
>>>>>
>>>>> Cloning via ssh.
>>>>
>>>> Great, can you add:
>>>>
>>>> [ui]
>>>> debug=yes
>>>>
>>>> In the HGRC of the remote repository and run a clone, this you give you
>>>> a tons of remote output that might help to understand what is going on
>>>> when the memory explode.
>>>
>>> Here is the result on the client BEFORE adding debug:
>>> % hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
>>> "C:\Repos\test"
>>> requesting all changes
>>> adding changesets
>>> adding manifests
>>> adding file changes ### Processes 123/5396 files, takes 10-15
>>> minutes, fails here
>>> transaction abort!
>>> rollback completed
>>> abort: stream ended unexpectedly (got 20593 bytes, expected 32768)
>>> [command returned code 255 Mon Jun 22 15:31:39 2020]
>>>
>>> When I add the debug entry it stalls at:
>>> % hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
>>> "C:\Repos\test"
>>> requesting all changes ### stalls here
>>>
>>>>
>>>>>>> The same process on Solaris 11.3 has a negligible
>>>>>>> impact on resources and finishes in about 10 minutes.
>>>>>>>
>>>>>>> I have spent several days with the Network and Systems Administrators
>>>>>>> trying to resolve this issue without success. We tried many things,
>>>>>>> including adjusting resource configurations, rebuilding Mercurial and
>>>>>>> Python, using Mercurial and Python from the working server, using the
>>>>>>> pre-built package from Oracle (v4.9.1),
>>>>>>
>>>>>> How did you transfer the repository between the two servers?
>>>>>
>>>>> I used hg clone (via ssh) between the servers without issue.
>>>>
>>>> This clone might have upgraded the repository to newer format, and
>>>> jumped on an unknown issue affecting you repository. what does `hg
>>>> debugformat` says on the older server?
>>>
>>> On older server:
>>> format-variant repo
>>> fncache: yes
>>> dotencode: yes
>>> generaldelta: yes
>>> sparserevlog: no
>>> sidedata: no
>>> copies-sdc: no
>>> plain-cl-delta: no
>>> compression: zlib
>>> compression-level: default
>>
>> Okay, so the most notable difference is `sparserevlog`. You might
>> encounter some unknown pathologilab. Can you try making a new server
>> clone using `--config format.sparse-revlog=no` during the clone ?
>>
>
> I created a new server clone using:
> hg clone --config format.sparse-revlog=no --noupdate
> ssh://<username>@<hostname>/<SRCreponame> <TARGETreponame>
> When I tried to clone with THG 5.0.2 via the UI I saw the same behavior.
> When I performed the clone via the console using:
> hg clone --config format.sparse-revlog=no --verbose
> ssh://<username>@<hostname>/<SRCreponame> "<TARGETreponame"
> I saw the same behavior.
You are cloning from the Solaris 11.3 machine into the solaris 11.4
machine right ? can you double check the `hg debugformat` of the
resulting clone ?
--
Pierre-Yves David
More information about the Mercurial
mailing list