Solaris 11.4 hosted repository, TortoiseHG clone attempt consumes all resources

Scott Newman - NOAA Affiliate scott.newman at noaa.gov
Mon Jun 22 19:49:27 UTC 2020


> >>> Good morning everyone!
> >>>
> >>> We are currently using Mercurial 5.2.2 hosted on Solaris 11.3 and accessed
> >>> by contributors via TortoiseHG 5.0.2 from their Windows Desktops.  We are
> >>> in the process of migrating applications to new hosts running Solaris
> >>> 11.4.
> >>
> >> As far as I understand, you use the same versions (Mercurial 5.2.2 on
> >> server TortoiseHG 5.0.2 on client) and the same python (probably 2.7
> >> something?) The only software version difference is Solaris 11.3 vs
> >> Solaris 11.4, right ?
> >
> > Pierre-Yves, so nice to hear from you!  Correct. Python 2.7.18 (tried
> > some others with the same result).  I have an update that when we
> > tried going back to THG 3.4 the clone worked as expected, but that
> > doesn't seem like a good long-term solution, particularly since we
> > will lose the ability to export-archive that  was introduced somewhere
> > around version 4.5, if you recall.
>
> That is very interesting, We are talking about using THG 3.4 on the
> client right? with still using Mercurial 5.2.2 on the server, right?

Correct.  It is so interesting that the client can have such an impact
on the server!

>
> If so, this means using a new protocol feature introduced in betwen 3.4
> and 5.2 reveal the issue.
>
> Can you confirm this? And if so, can you try to find the exact Mercurial
> version client side that trigger this issue?

I am scheduled to work on this with another resource tomorrow at 15:00
EST and will update this thread.  We have confirmed that the problem
exists in THG4.5.0, so it will be somewhere in between 3.4 and 4.5.0.

>
> However, the export-archive thingy is something you run server side,
> don't you?
>

We perform this task on the client side now with the archive function
and have abandoned the customization in favor of the built-in archive
functionality added

>
> >
> >>
> >>>   When trying to clone a copy of the repository hosted on Solaris
> >>> 11.4 the clone runs very slowly and the process consumes most of the
> >>> memory (64GB) on the host, starts generating "-bash: fork: Resource
> >>> temporarily unavailable" errors for users on the box after about 2
> >>> minutes, and the clone process fails with a " Server Unexpectedly closed
> >>> connection" message.
> >>
> >> So, the serveur hosting the repository is crumbly while cloning right?
> >> how are you cloning ? ssh or http ?
> >
> > Cloning via ssh.
>
> Great, can you add:
>
>    [ui]
>    debug=yes
>
> In the HGRC of the remote repository and run a clone, this you give you
> a tons of remote output that might help to understand what is going on
> when the memory explode.

Here is the result on the client BEFORE adding debug:
% hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
"C:\Repos\test"
requesting all changes
adding changesets
adding manifests
adding file changes  ### Processes 123/5396 files, takes 10-15
minutes, fails here
transaction abort!
rollback completed
abort: stream ended unexpectedly  (got 20593 bytes, expected 32768)
[command returned code 255 Mon Jun 22 15:31:39 2020]

When I add the debug entry it stalls at:
% hg clone --verbose ssh://<username>@<hostname>//<dirname>/<reponame>
"C:\Repos\test"
requesting all changes  ### stalls here

>
> >>>   The same process on Solaris 11.3 has a negligible
> >>> impact on resources and finishes in about 10 minutes.
> >>>
> >>> I have spent several days with the Network and Systems Administrators
> >>> trying to resolve this issue without success.  We tried many things,
> >>> including adjusting resource configurations, rebuilding Mercurial and
> >>> Python, using Mercurial and Python from the working server, using the
> >>> pre-built package from Oracle (v4.9.1),
> >>
> >> How did you transfer the repository between the two servers?
> >
> > I used hg clone (via ssh) between the servers without issue.
>
> This clone might have upgraded the repository to newer format, and
> jumped on an unknown issue affecting you repository. what does `hg
> debugformat` says on the older server?

On older server:
format-variant    repo
fncache:           yes
dotencode:         yes
generaldelta:      yes
sparserevlog:       no
sidedata:           no
copies-sdc:         no
plain-cl-delta:     no
compression:       zlib
compression-level: default


>
> >
> >> what does `hg debugformat` says on both end?
> >
> > On Server:
> > format-variant    repo
> > fncache:           yes
> > dotencode:         yes
> > generaldelta:      yes
> > sparserevlog:      yes
> > sidedata:           no
> > copies-sdc:         no
> > plain-cl-delta:    yes
> > compression:       zlib
> > compression-level: default
> > On client:
> > Since I cannot get the clone to finish I do not have a local clone to
> > run this command against.  I may not understand specifically what you
> > want here.
> >
> >> how big is the `.hg/store/` directory on both side?
> >
> > On the server it is 185MB.
> > On the client, I cannot get the clone to complete so I am left without
> > anything, it seems to clean itself up after the failure.
> >
> >> How many revisions do you have in your repository?
> >
> > 874 revisions
>
> That is very small, so you are definitly hitting some kind of bad bug.
> Lets find out which one now :-)
>

Thanks again!



More information about the Mercurial mailing list