Solaris 11.4 hosted repository, TortoiseHG clone attempt consumes all resources

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Jun 23 20:23:32 UTC 2020



On 6/23/20 10:19 PM, Scott Newman - NOAA Affiliate wrote:
>>>>>>>>> Good morning everyone!
>>>>>>>>>
>>>>>>>>> We are currently using Mercurial 5.2.2 hosted on Solaris 11.3 and accessed
>>>>>>>>> by contributors via TortoiseHG 5.0.2 from their Windows Desktops.  We are
>>>>>>>>> in the process of migrating applications to new hosts running Solaris
>>>>>>>>> 11.4.
>>>>>>>>
>>>>>>>> As far as I understand, you use the same versions (Mercurial 5.2.2 on
>>>>>>>> server TortoiseHG 5.0.2 on client) and the same python (probably 2.7
>>>>>>>> something?) The only software version difference is Solaris 11.3 vs
>>>>>>>> Solaris 11.4, right ?
>>>>>>>
>>>>>>> Pierre-Yves, so nice to hear from you!  Correct. Python 2.7.18 (tried
>>>>>>> some others with the same result).  I have an update that when we
>>>>>>> tried going back to THG 3.4 the clone worked as expected, but that
>>>>>>> doesn't seem like a good long-term solution, particularly since we
>>>>>>> will lose the ability to export-archive that  was introduced somewhere
>>>>>>> around version 4.5, if you recall.
>>>>>>
>>>>>> That is very interesting, We are talking about using THG 3.4 on the
>>>>>> client right? with still using Mercurial 5.2.2 on the server, right?
>>>>>
>>>>> Correct.  It is so interesting that the client can have such an impact
>>>>> on the server!
>>>>>
>>>>>>
>>>>>> If so, this means using a new protocol feature introduced in betwen 3.4
>>>>>> and 5.2 reveal the issue.
>>>>>>
>>>>>> Can you confirm this? And if so, can you try to find the exact Mercurial
>>>>>> version client side that trigger this issue?
>>>>>
>>>>> I am scheduled to work on this with another resource tomorrow at 15:00
>>>>> EST and will update this thread.  We have confirmed that the problem
>>>>> exists in THG4.5.0, so it will be somewhere in between 3.4 and 4.5.0.
>>>>>
>>>>>>
>>>>>> However, the export-archive thingy is something you run server side,
>>>>>> don't you?
>>>>>>
>>>>>
>>>>> We perform this task on the client side now with the archive function
>>>>> and have abandoned the customization in favor of the built-in archive
>>>>> functionality added
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>      When trying to clone a copy of the repository hosted on Solaris
>>>>>>>>> 11.4 the clone runs very slowly and the process consumes most of the
>>>>>>>>> memory (64GB) on the host, starts generating "-bash: fork: Resource
>>>>>>>>> temporarily unavailable" errors for users on the box after about 2
>>>>>>>>> minutes, and the clone process fails with a " Server Unexpectedly closed
>>>>>>>>> connection" message.
>>>>>>>>
>>>>>>>> So, the serveur hosting the repository is crumbly while cloning right?
>>>>>>>> how are you cloning ? ssh or http ?
>>>>>>>
>>>>>>> Cloning via ssh.
>>>>>>
>>>>>> Great, can you add:
>>>>>>
>>>>>>       [ui]
>>>>>>       debug=yes
>>>>>>
>>>>>> In the HGRC of the remote repository and run a clone, this you give you
>>>>>> a tons of remote output that might help to understand what is going on
>>>>>> when the memory explode.
>>>>>
>>>>> Here is the result on the client BEFORE adding debug:
>>>>> % hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
>>>>> "C:\Repos\test"
>>>>> requesting all changes
>>>>> adding changesets
>>>>> adding manifests
>>>>> adding file changes  ### Processes 123/5396 files, takes 10-15
>>>>> minutes, fails here
>>>>> transaction abort!
>>>>> rollback completed
>>>>> abort: stream ended unexpectedly  (got 20593 bytes, expected 32768)
>>>>> [command returned code 255 Mon Jun 22 15:31:39 2020]
>>>>>
>>>>> When I add the debug entry it stalls at:
>>>>> % hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
>>>>> "C:\Repos\test"
>>>>> requesting all changes  ### stalls here
>>>>>
>>>>>>
>>>>>>>>>      The same process on Solaris 11.3 has a negligible
>>>>>>>>> impact on resources and finishes in about 10 minutes.
>>>>>>>>>
>>>>>>>>> I have spent several days with the Network and Systems Administrators
>>>>>>>>> trying to resolve this issue without success.  We tried many things,
>>>>>>>>> including adjusting resource configurations, rebuilding Mercurial and
>>>>>>>>> Python, using Mercurial and Python from the working server, using the
>>>>>>>>> pre-built package from Oracle (v4.9.1),
>>>>>>>>
>>>>>>>> How did you transfer the repository between the two servers?
>>>>>>>
>>>>>>> I used hg clone (via ssh) between the servers without issue.
>>>>>>
>>>>>> This clone might have upgraded the repository to newer format, and
>>>>>> jumped on an unknown issue affecting you repository. what does `hg
>>>>>> debugformat` says on the older server?
>>>>>
>>>>> On older server:
>>>>> format-variant    repo
>>>>> fncache:           yes
>>>>> dotencode:         yes
>>>>> generaldelta:      yes
>>>>> sparserevlog:       no
>>>>> sidedata:           no
>>>>> copies-sdc:         no
>>>>> plain-cl-delta:     no
>>>>> compression:       zlib
>>>>> compression-level: default
>>>>
>>>> Okay, so the most notable difference is `sparserevlog`. You might
>>>> encounter some unknown pathologilab. Can you try making a new server
>>>> clone using `--config format.sparse-revlog=no` during the clone ?
>>>>
>>>
>>> I created a new server clone using:
>>> hg clone --config format.sparse-revlog=no --noupdate
>>> ssh://<username>@<hostname>/<SRCreponame> <TARGETreponame>
>>> When I tried to clone with THG 5.0.2 via the UI I saw the same behavior.
>>> When I performed the clone via the console using:
>>> hg clone --config format.sparse-revlog=no --verbose
>>> ssh://<username>@<hostname>/<SRCreponame> "<TARGETreponame"
>>> I saw the same behavior.
>>
>> You are cloning from the Solaris 11.3 machine into the solaris 11.4
>> machine right ? can you double check the `hg debugformat` of the
>> resulting clone ?
>>
> 
> Correct, I cloned to 11.4 machine from 11.3 machine, then tried to
> clone to Windows machine from 11.4 machine using THG 5.0.2.  Here are
> the hg debugformat results:
> format-variant    repo
> fncache:           yes
> dotencode:         yes
> generaldelta:      yes
> sparserevlog:       no
> sidedata:           no
> copies-sdc:         no
> plain-cl-delta:    yes
> compression:       zlib
> compression-level: default
> 
> Scott

Okay, so this is not the source of the issue.
What happens if you copy the repository from one server to the other (no 
clone, just `scp -r` ?

-- 
Pierre-Yves David



More information about the Mercurial mailing list