Call for testing: generaldelta

Daniel Carrera dcarrera at gmail.com
Mon Jul 25 19:38:48 UTC 2011


Hello,

On 07/25/2011 08:40 PM, Matt Mackall wrote:
> Mercurial 1.9 includes an experimental feature called 'generaldelta'
> that should improve compression in repositories with lots of branching.
> Please help us test it so that we can work towards making it the
> default.

I have recently converted the past 3 years of GCC's history to 
Mercurial. GCC is not a project with a lot of branching so it may not be 
a good candidate for generaldelta, but I thought I'd post anyway.


> == Evaluating compression: ==
>
> Do two clones:
>
> $ hg clone -U --pull proj proj-normal
> $ hg clone -U --pull --config format.generaldelta=1 proj proj-gdelta
>
> Then compare their sizes:
>
> (Unix) $ du -sh proj-normal proj-gdelta
> 31M	proj-normal
> 26M	proj-gdelta


~/Hg $ du -sh gcc-3-years
1.1G	gcc-3-years

~/Hg $ hg clone -U --pull gcc-3-years gcc-normal
~/Hg $ hg clone -U --pull --config format.generaldelta=1 gcc-3-years 
~/Hg $ du -sh gcc-normal gcc-gdelta
443M	gcc-normal
443M	gcc-gdelta


> And compare their manifest sizes:

~/Hg $ ls -l gcc-normal/.hg/store/00manifest.*
... 8386062 2011-07-25 20:45 gcc-normal/.hg/store/00manifest.d
... 1396736 2011-07-25 20:45 gcc-normal/.hg/store/00manifest.i

~/Hg $ ls -l gcc-gdelta/.hg/store/00manifest.*
... 8386062 2011-07-25 20:56 gcc-gdelta/.hg/store/00manifest.d
... 1396736 2011-07-25 20:56 gcc-gdelta/.hg/store/00manifest.i



> This data may also be valuable:
>
> $ hg debugrevlog -m


~/Hg/gcc-gdelta $ hg debugrevlog -m
format : 1
flags  : generaldelta

revisions     :   21824
     merges    :       0 ( 0.00%)
     normal    :   21824 (100.00%)
revisions     :   21824
     full      :       1 ( 0.00%)
     deltas    :   21823 (100.00%)
revision size : 8386062
     full      : 1870613 (22.31%)
     deltas    : 6515449 (77.69%)

avg chain length  : 10911
compression ratio : 15407

uncompressed data size (min/max/avg) : 5285360 / 6474921 / 5920430
full revision size (min/max/avg)     : 1870613 / 1870613 / 1870613
delta size (min/max/avg)             : 0 / 240954 / 298

deltas against prev  : 21823 (100.00%)
     where prev = p1  : 21823     (100.00%)
     where prev = p2  :     0     ( 0.00%)
     other            :     0     ( 0.00%)
deltas against p1    :     0 ( 0.00%)
deltas against p2    :     0 ( 0.00%)
deltas against other :     0 ( 0.00%)


> == Evaluating performance: ==
>
> Servers serving general-delta repositories will reorder changesets on
> the fly to improve compression and streaming performance over the
> existing wire protocol. So we'd like to see three results:
>
> - cloning from old to old (baseline):
> $ hg clone --time -U --pull proj-normal proj-normal-normal

~/Hg $ hg clone --time -U --pull gcc-normal gcc-normal-normal
requesting all changes
21825 changesets found
adding changesets
adding manifests
adding file changes
added 21825 changesets with 195646 changes to 74306 files
Time: real 350.030 secs (user 181.990+0.000 sys 25.360+0.000)


> - cloning from new to old
> $  hg clone --time -U --pull proj-gdelta proj-gdelta-normal

~/Hg $ hg clone --time -U --pull gcc-gdelta gcc-gdelta-normal
requesting all changes
21825 changesets found
adding changesets
adding manifests
adding file changes
added 21825 changesets with 195646 changes to 74306 files
Time: real 397.720 secs (user 187.350+0.000 sys 26.090+0.000)


> - cloning from new to new
> $  hg clone --time -U --pull --config format.generaldelta=1 proj-gdelta proj-gdelta-gdelta

~/Hg $ hg clone --time -U --pull --config format.generaldelta=1 
gcc-gdelta gcc-gdelta-gdelta
requesting all changes
21825 changesets found
adding changesets
adding manifests
adding file changes
added 21825 changesets with 195646 changes to 74306 files
Time: real 290.530 secs (user 186.830+0.000 sys 22.180+0.000)


> And then, compare the sizes again:
>
> $ du -sh proj-normal-normal proj-gdelta-normal proj-gdelta-gdelta
> 31M	hgnn
> 27M	hggn
> 26M	hggg

~/Hg $ du -sh gcc-normal-normal gcc-gdelta-normal gcc-gdelta-gdelta
443M	gcc-normal-normal
443M	gcc-gdelta-normal
443M	gcc-gdelta-gdelta


> == Evaluating window size ==
>
> Tweaking the compression window size can potentially have a large impact
> on resulting size, but right now tuning it requires hacking the source.
> Around line 1051 in mercurial/revlog.py is the following magic constant:
>
>          if d is None or dist>  textlen * 2:
>              text = buildtext()
>              data = compress(text)
>
> Changing that "* 2" to values between 3 and 10 will change the
> compression/performance trade-off and may result in large improvements
> in generaldelta compression in repositories with lots of branching. If
> that's you, give it a try and tell us what you find. Again, the output
> of 'hg debugrevlog -m' may be valuable.

Since GCC doesn't have a lot of branching, I skipped this test.

Cheers,
Daniel.
-- 
I'm not overweight, I'm undertall.



More information about the Mercurial mailing list