hg outgoing performance: bug or design limitation?
Andrei Loskutov
loskutov at gmx.de
Mon Dec 14 21:50:52 UTC 2009
On Mon, 14 Dec 2009 00:11:53 +0100, Matt Mackall <mpm at selenic.com> wrote:
[...]
>> matter how simple the formatting/how many arguments I use, the simple
>> presence of one of these "file" arguments causes hg to be exceptionally
>> slow.
>>
>> I didn't played with the "hg incoming" yet, I guess it will behave
>> similar.
Ok, I've played with incoming now. Here we have another, even bigger
performance issue: in order to provide meaningful diffs for the incoming
changesets, we call hg with this arguments:
D:\django1\hg -y incoming --debug --style style_file --bundle
D:\bundleFile.tmp D:\django2
Please ignore the style file as I've already stripped down all "file_adds"
etc parameters from there. Still, hg needs 5-7 minutes to complete the
task, with 100% CPU/disk load. I observe, that the bundle file which is
created by the incoming operation (containing 1000 changesets) is 300 MB
big, and the hg spends most of the time (5-7 minutes?) only to compute
diffs/writing into this file.
The ONLY reason for us to use this bundle file is to be able later show
the diffs to the incoming changesets. Is there any better way to compute
these diffs (for *incoming* changesets) "on the fly", just by using the
file path and the changeset info? I'm just looking for a practical, fast
way to retrieve these incoming diffs in such a big repo. In "usual" small
open source projects you do not see such huge performance issues, but they
are critical as soon as hg goes "enterprise".
>> It's simple: in the plugin, we show the outgoing/incoming changesets to
>> the user, *including* the changed files information. We MUST know which
>> files are affected by the changeset...
>
> I bet it takes the user longer than 3 minutes to read through those 1k
> csets. What? They don't actually read them all? Then the plug-in needn't
> generate them to start with. Instead it can call status -r a:b to
> generate them later.
I agree, this could be another way to improve the situation. I've used
1000 changesets just to stress the hg, I do not expect that in the daily
work you would have such a big delta to the main repo state.
>> 1) I'm doing something wrong? Is there a better way to get the affected
>> paths for outgoing/incoming commands as using templates?
>
> Use the 'files' parameter instead.
I will, thank you. The (small) drawback for the user would be, that it
will not see if a file is added/removed or just changed until requesting a
diff. But I can live with this, if it would save me each time 5 minutes of
my life :-)
>> 2) Is there any hidden parameter/command/setting which allows hg to be
>> faster with templates/outgoing/incoming commands?
>> 3) if 1 and 2 are not applicable: is this performance drop a "simple" hg
>> bug, or is it "by design"?
>
> Changelog entries include a list of files 'affected' ('files' in
> template-speak) by each changeset, but that information is not enough to
> figure out which of those files were added, deleted, or modified (or
> copied!). To do that, we must do a much more expensive operation:
> comparing two manifests. So yes, it is a design limitation.
I expected that there is something like that. It would be nice if you
would consider some improvements (as you mentioned) in the changelog
files. Especially many big companies would love this if they would be able
to throw away Clearcase/CSV/SVN in favor of hg. But then they expect that
hg (and also Eclipse support which is a must) *scales* on *really* big
projects. We are talking here about 50000 files and more as the minimum
repo size, plus multiple branches in the repo...
--
Kind regards,
Mit freundlichen Grüßen
Andrei Loskutov
@Home: http://andrei.gmxhome.de/
More information about the Mercurial
mailing list