hg convert from SVN repo getting stuck

Malte Helmert malte.helmert at unibas.ch
Tue Apr 29 06:26:26 UTC 2014


On 26.04.2014 19:36, Malte Helmert wrote:
> Dear group,
> 
> I'm having some problems with hg convert:
[...]
> Any suggestions?

Hi again,

while the above email was stuck in moderation (or gmane?), I've done
some tests to reproduce the same behaviour outside of hg convert. This
time, I used the most recent hg from the stable branch:

$ ./hg version
Mercurial Distributed SCM (version 3.0-rc+28-d36440d84328)
[...]

I produced smaller versions of the file that was modified in the
changeset that caused hg convert to get stuck. For this I truncated the
file at $SIZE bytes for various values of SIZE. Then I created a
pristine repository and committed first the truncated before-changeset
version of the file and then the truncated after-changeset version:

$ hg init testrepo
$ cd testrepo
$ cp ../before file.txt
$ hg add file.txt
$ hg commit -m "first commit"
$ cp ../after file.txt
$ hg diff file.txt | wc -l
$ hg commit -m "second commit"

For various values of SIZE, here is how long the "hg diff" and second
commit took, along with the size of the diff:

SIZE=1M: diff 0.29s (22895 lines), commit 0.20s
SIZE=2M: diff 0.81s (47439 lines), commit 0.61s
SIZE=3M: diff 1.76s (72830 lines), commit 1.47s
SIZE=4M: diff 7.43s (97965 lines), commit 6.99s
SIZE=5M: diff 4.43s (122899 lines), commit 3.91s
SIZE=6M: diff 88.36s (147787 lines), commit 90.56s
SIZE=7M: diff 10.51s (172759 lines), commit 9.74s
SIZE=8M: diff 202.15s (198395 lines), commit 202.04s
SIZE=9M: diff 25.33s (223536 lines), commit 24.55s
SIZE=10M: diff 162.89s (248567 lines), commit 159.23s
SIZE=12M: diff 132.42s (299271 lines), commit 130.41s
SIZE=14M: diff 53.40s (350046 lines), commit 51.73s
SIZE=16M: diff 2822.02s (400085 lines), commit 2802.49s
SIZE=18M: diff 16687.10s (450685 lines), commit 16856.23s
SIZE=20M: diff 20673.23s (501958 lines), commit 20639.25s
SIZE=22M: diff 25507.32s (552989 lines), commit 25758.22s
SIZE=24M: diff 30875.81s (603709 lines), commit 31276.82s
...
(Larger sizes did not terminate yet.)

So there is some nasty scaling here. As a reference point, for the full
file (35 MB), "diff -u" (GNU diff) takes 0.9 seconds.

So I seem to have hit a bad case for Mercurial's diff algorithm. Would
there be interest in uncovering the reason for this, and possibly
modifying the diff algorithm to address this?

Malte




More information about the Mercurial mailing list