Call for testing: generaldelta
Daniel Carrera
dcarrera at gmail.com
Mon Jul 25 19:38:48 UTC 2011
Hello,
On 07/25/2011 08:40 PM, Matt Mackall wrote:
> Mercurial 1.9 includes an experimental feature called 'generaldelta'
> that should improve compression in repositories with lots of branching.
> Please help us test it so that we can work towards making it the
> default.
I have recently converted the past 3 years of GCC's history to
Mercurial. GCC is not a project with a lot of branching so it may not be
a good candidate for generaldelta, but I thought I'd post anyway.
> == Evaluating compression: ==
>
> Do two clones:
>
> $ hg clone -U --pull proj proj-normal
> $ hg clone -U --pull --config format.generaldelta=1 proj proj-gdelta
>
> Then compare their sizes:
>
> (Unix) $ du -sh proj-normal proj-gdelta
> 31M proj-normal
> 26M proj-gdelta
~/Hg $ du -sh gcc-3-years
1.1G gcc-3-years
~/Hg $ hg clone -U --pull gcc-3-years gcc-normal
~/Hg $ hg clone -U --pull --config format.generaldelta=1 gcc-3-years
~/Hg $ du -sh gcc-normal gcc-gdelta
443M gcc-normal
443M gcc-gdelta
> And compare their manifest sizes:
~/Hg $ ls -l gcc-normal/.hg/store/00manifest.*
... 8386062 2011-07-25 20:45 gcc-normal/.hg/store/00manifest.d
... 1396736 2011-07-25 20:45 gcc-normal/.hg/store/00manifest.i
~/Hg $ ls -l gcc-gdelta/.hg/store/00manifest.*
... 8386062 2011-07-25 20:56 gcc-gdelta/.hg/store/00manifest.d
... 1396736 2011-07-25 20:56 gcc-gdelta/.hg/store/00manifest.i
> This data may also be valuable:
>
> $ hg debugrevlog -m
~/Hg/gcc-gdelta $ hg debugrevlog -m
format : 1
flags : generaldelta
revisions : 21824
merges : 0 ( 0.00%)
normal : 21824 (100.00%)
revisions : 21824
full : 1 ( 0.00%)
deltas : 21823 (100.00%)
revision size : 8386062
full : 1870613 (22.31%)
deltas : 6515449 (77.69%)
avg chain length : 10911
compression ratio : 15407
uncompressed data size (min/max/avg) : 5285360 / 6474921 / 5920430
full revision size (min/max/avg) : 1870613 / 1870613 / 1870613
delta size (min/max/avg) : 0 / 240954 / 298
deltas against prev : 21823 (100.00%)
where prev = p1 : 21823 (100.00%)
where prev = p2 : 0 ( 0.00%)
other : 0 ( 0.00%)
deltas against p1 : 0 ( 0.00%)
deltas against p2 : 0 ( 0.00%)
deltas against other : 0 ( 0.00%)
> == Evaluating performance: ==
>
> Servers serving general-delta repositories will reorder changesets on
> the fly to improve compression and streaming performance over the
> existing wire protocol. So we'd like to see three results:
>
> - cloning from old to old (baseline):
> $ hg clone --time -U --pull proj-normal proj-normal-normal
~/Hg $ hg clone --time -U --pull gcc-normal gcc-normal-normal
requesting all changes
21825 changesets found
adding changesets
adding manifests
adding file changes
added 21825 changesets with 195646 changes to 74306 files
Time: real 350.030 secs (user 181.990+0.000 sys 25.360+0.000)
> - cloning from new to old
> $ hg clone --time -U --pull proj-gdelta proj-gdelta-normal
~/Hg $ hg clone --time -U --pull gcc-gdelta gcc-gdelta-normal
requesting all changes
21825 changesets found
adding changesets
adding manifests
adding file changes
added 21825 changesets with 195646 changes to 74306 files
Time: real 397.720 secs (user 187.350+0.000 sys 26.090+0.000)
> - cloning from new to new
> $ hg clone --time -U --pull --config format.generaldelta=1 proj-gdelta proj-gdelta-gdelta
~/Hg $ hg clone --time -U --pull --config format.generaldelta=1
gcc-gdelta gcc-gdelta-gdelta
requesting all changes
21825 changesets found
adding changesets
adding manifests
adding file changes
added 21825 changesets with 195646 changes to 74306 files
Time: real 290.530 secs (user 186.830+0.000 sys 22.180+0.000)
> And then, compare the sizes again:
>
> $ du -sh proj-normal-normal proj-gdelta-normal proj-gdelta-gdelta
> 31M hgnn
> 27M hggn
> 26M hggg
~/Hg $ du -sh gcc-normal-normal gcc-gdelta-normal gcc-gdelta-gdelta
443M gcc-normal-normal
443M gcc-gdelta-normal
443M gcc-gdelta-gdelta
> == Evaluating window size ==
>
> Tweaking the compression window size can potentially have a large impact
> on resulting size, but right now tuning it requires hacking the source.
> Around line 1051 in mercurial/revlog.py is the following magic constant:
>
> if d is None or dist> textlen * 2:
> text = buildtext()
> data = compress(text)
>
> Changing that "* 2" to values between 3 and 10 will change the
> compression/performance trade-off and may result in large improvements
> in generaldelta compression in repositories with lots of branching. If
> that's you, give it a try and tell us what you find. Again, the output
> of 'hg debugrevlog -m' may be valuable.
Since GCC doesn't have a lot of branching, I skipped this test.
Cheers,
Daniel.
--
I'm not overweight, I'm undertall.
More information about the Mercurial
mailing list