Converting Mercurial hgweb from CGI to WSGI
Timm Aaron M
TimmAaronM at phoeintl.com
Thu Aug 2 18:04:22 UTC 2012
-----Original Message-----
From: ezquerra at gmail.com [mailto:ezquerra at gmail.com] On Behalf Of Angel Ezquerra
Sent: Thursday, August 02, 2012 10:18 AM
To: Timm Aaron M
Cc: mercurial at selenic.com; Mads Kiilerich
Subject: Re: Converting Mercurial hgweb from CGI to WSGI
On Thu, Aug 2, 2012 at 4:47 PM, Timm Aaron M <TimmAaronM at phoeintl.com> wrote:
> Angel
>
> We too are serving several hundred repos and subrepos. The script idea you use sounds very useful. I was wondering if you could give any more detail on it. I don't completely understand the explanation you gave. Also, what time improvement did you see? Thanks.
>
> -----Original Message-----
> From: ezquerra at gmail.com [mailto:ezquerra at gmail.com] On Behalf Of
> Angel Ezquerra
> Sent: Thursday, August 02, 2012 5:39 AM
> To: Mads Kiilerich
> Cc: Timm Aaron M; mercurial at selenic.com
> Subject: Re: Converting Mercurial hgweb from CGI to WSGI
>
> On Thu, Aug 2, 2012 at 11:58 AM, Mads Kiilerich <mads at kiilerich.com> wrote:
>> On 02/08/12 09:54, Angel Ezquerra wrote:
>>>
>>> If you serve many repositories, particularly if you use subrepos,
>>> _do not_ use globs (particularly recursive globs) to specify them in
>>> your "paths" section. Instead list each repository on your paths
>>> section one by one.
>>>
>>> Mercurial currently looks for repos on every location that you
>>> specify with a glob on your paths section _for every request_.
>>
>>
>> hgweb should refresh regularly when using a long lived process
>> (refreshinterval = 20 seconds) but not on every request.
>>
>> Scanning through the repo tree should however not be expensive ...
>> unless you have huge working directories checket out. There is no
>> reason for that ... and as you have found there is a good reason to not do that.
>>
>> /Mads
>
> It is true that if you do not have huge working directories checked out this should not be such a problem. Unfortunately that is not something that I can enforce easily in our setup (for several, mostly political reasons).
>
> BTW, what do you mean by "long lived process". I got the impression that when using CGI each request required a new execution of mercurial on the server. Is that not the case? I do not know about "refreshinterval" either. When you guys helped me improve the performance of our server Matt himself told me that mercurial does a repository search on every request (even when accessing a particular repo). I may have misunderstood him, though.
>
> Cheers,
>
> Angel
Timm, please note that we usually bottom post in this list :-)
Let's make an example:
Let's say that all your repos are in c:\repos. For example let's say that you had the following repository structure:
c:\repos
\project1
\repo11
\.hg
\include
\subrepo111
\.hg
\subrepo112
\.hg
\repo12
\.hg
\project2
\repo21
\.hg
\repo22
\.hg
\repo3
Since all your repos are below c:\repos, you could set your paths section to:
[paths]
/ = c:/repos/**
which asks mercurial to perform a recursive search in c:\repos.
Instead I suggest that you should set your paths section to:
[paths]
/project1/repo11 = c:/repos/project1/repo11
/project1/repo11/include/surepo111 = c:/repos/project1/repo11/include/surepo111
/project1/repo11/include/surepo112 = c:/repos/project1/repo11/include/surepo112
/project1/repo12 = c:/repos/project1/repo12
/project2/repo21 = c:/repos/project2/repo21
/project2/repo22 = c:/repos/project2/repo22
/repo3 = c:/repos/repo3
That is, that you should explicitly add each repository to your paths section.
The result, from the user point of view should be the same in that he would see the same repos on the repository list. However the performance can be much better with the second configuration. In our case, with around 600 repositories and subrepos (some of them updated to a revision other than 0) we saw an improvement of 15 to 20 seconds _per server request_. That is, the server went from barely usable to quite snappy.
Note that as others said this is probably only a big issue if some of the repos on the server are not on the 0000 revision.
The problem with this approach is that you must keep the path list up to date. It would be easy to add a new repository and forget to add it to the paths section. We solved this issue with a small python script that is automatically run every 10 or 15 minutes (through the windows task scheduler), which searches for repos below our root serving folder. Since we have a script to create repositories through the web server interface, that script also adds the repos that it creates to the path list.
The scripts are quite simple, but I don't have them at hand. I could probably send them to you if you want them, or I could put them somewhere on the wiki for future reference.
Cheers,
Angel
Angel,
I would definitely like to see the scripts that you have created, if you don't mind, to see what improvements I can make for our setup. We have over 1000 repos and subrepos, most at a revision other than 0. We will also have a short list of who is allowed to create new repos so updating the list shouldn't be a big problem.
More information about the Mercurial
mailing list