Integration of Mercurial into LXR

Simon King simon at simonking.org.uk
Fri Dec 14 16:45:43 UTC 2012


[Back to the list again - etiquette on this list is to "Reply-all"]

On Fri, Dec 14, 2012 at 4:08 PM, andre-littoz <page74010-sf at yahoo.fr> wrote:
>>> I tested Simon's suggestion and adapted it a bit (see the attached file).
>>> However, this does not change performance: I'm still 10 times slower than
>>> Git or Subversion. From what I understand in Python code, the full tree
>>> is
>>> retrieved from ctx.manifest() and then exhaustively explored through an
>>> "iterator" mf.iteritems(). I think this is not fundamentally different
>>> from
>>> command 'hg locate ...' and explains why performance does not improve.
>>>
>
>>I think that is correct. "hg locate" does use mercurial's
>>pattern-matching machinery, but I doubt that adds much overhead:
>
>>How big is the repository that you are testing this on (number of
>>revisions and number of files)? In mercurial, the manifest is a single
>>revlog, so "exploring the whole repository" isn't as expensive as it
>>sounds. I don't know enough about Git or Subversion's storage models
>>to suggest why they are so much faster.
> +++++
> My test repo is converted from an original CVS repo. It contains more than
> 750 changesets for roughly 200 files, including deleted and renamed ones.
> To measure the impact of Python code, I forced the 'ls' extension to return
> a fixed set of strings (thus no computation time) and 'fsize' to return 0. I
> still get this annoying lag in seconds. What's stranger, is the time needed
> to display a 3-line file: the only command passed is 'hg locate' with the
> correct rev. No computation needed. I'm beginning to suspect something wrong
> with piping the commands (though this is the same technique as with Git and
> Mercurial).
> Another explanation could be a systematic recompilation of Python code on
> every request: the 'hg xxx' commands are sent through a pipe. However, the
> library directory contains .py, .pyc and .pyo versions of the files.

750 changesets and 200 files is a small repository. For comparison,
the mercurial repository has 18000 changesets and nearly 1000 files.
You certainly shouldn't be having performance issues.

How long do various simple commands take to run in your repository.
You could try:

  time hg version
  time hg tip
  time hg manifest >/dev/null
  time hg manifest -r 700 >/dev/null

Do you get different timings when the commands are run by the web
server? I assume that the repository is on a local disk, and not
NFS-mounted or anything like that.

>
>
>>> Paul suggested to attack the problem the other way, through change sets.
> +++++
> I don't think I'll follow this lead. Probably too much hazardous coding and
> the risk to be at mercy of a change in Mercurial.

I don't think Paul's suggestion was necessarily to use the Mercurial
API - he was just suggesting that you update to each repository
revision in turn, and store the size of every file in your own data
structure. This can all be done via the mercurial command-line and in
whatever language you like. (However, I don't think it's actually
necessary to update to each revision; with appropriate use of "hg
manifest -r" and "hg cat -r" you should be able to extract everything
you need)

>
>
>>Does this handful of seconds include calculating the file size? I can
>>imagine that being slow, but just listing the filenames should be
>>fairly quick.
> +++++
> As I wrote above, I had 'filesize' return 0 so that this calculation would
> be ruled out. Same response time.
>
>
>
>>>
>>> With a Python extension, I stumbled into the trustworthiness issue. LXR
>>> is a
>>> two-stage process:
>
>>You should be able to solve this by enabling the extension in each
>>user's ~/.hgrc file, rather than in the repository. If one of the
>>users doesn't have a home directory (eg. the "apache" user), you can
>>set the HGRCPATH environment variable to point to an alternative
>>location.
>
> +++++
> I found nearly the same solution, forcing in my pipe a HOME environment
> variable to point where I put a .hgrc file. I'll switch to HGRCPATH as it
> appears cleaner.
>
>
> Summary:
> 1- Trust issue solved
> 2- Performance pending

Just to give you some idea of what "normal" performance might look
like, here are some timings against a clone of the mercurial
repository:

# Get the manifest for the current working revision
$: time hg --cwd /tmp/mercurial manifest >/dev/null

real    0m0.256s
user    0m0.167s
sys     0m0.042s

# Get the manifest of an arbitrary old revision
$: time hg --cwd /tmp/mercurial manifest -r 12345 >/dev/null

real    0m0.252s
user    0m0.161s
sys     0m0.034s

Hope that helps,

Simon



More information about the Mercurial mailing list