Call for testing: generaldelta

Augie Fackler durin42 at gmail.com
Tue Jul 26 17:38:39 UTC 2011


 
On Jul 25, 2011, at 11:40 AM, Matt Mackall wrote:

> Mercurial 1.9 includes an experimental feature called 'generaldelta'
> that should improve compression in repositories with lots of branching.
> Please help us test it so that we can work towards making it the
> default.
> 
> Things we'd like to evaluate:
> 
> - how significant the compression improvements are
> - how much overhead there is when communicating with older clients
> - (advanced) what the best trade-off compression window size is

I don't have the repo handy (cloning it now), but http://hg.adium.im/adium/ got bigger when I ran it through generaldelta.

> 
> == Evaluating compression: ==
> 
> Do two clones:
> 
> $ hg clone -U --pull proj proj-normal
> $ hg clone -U --pull --config format.generaldelta=1 proj proj-gdelta
> 
> Then compare their sizes:
> 
> (Unix) $ du -sh proj-normal proj-gdelta
> 31M	proj-normal
> 26M	proj-gdelta
> 
> And compare their manifest sizes:
> $ ls -l proj-normal/.hg/store/00manifest.*
> -rw-r--r-- 1 1000 1000 6043911 Jul 25 13:18 hgn/.hg/store/00manifest.d
> -rw-r--r-- 1 1000 1000  955648 Jul 25 13:18 hgn/.hg/store/00manifest.i
> $ ls -l proj-gdelta/.hg/store/00manifest.*
> -rw-r--r-- 1 1000 1000 3197528 Jul 25 13:15 hgg/.hg/store/00manifest.d
> -rw-r--r-- 1 1000 1000  955648 Jul 25 13:15 hgg/.hg/store/00manifest.i
> 
> This data may also be valuable:
> 
> $ hg debugrevlog -m
> format : 1
> flags  : generaldelta
> 
> revisions     :   14932
>    merges    :    1763 (11.81%)
>    normal    :   13169 (88.19%)
> revisions     :   14932
>    full      :      61 ( 0.41%)
>    deltas    :   14871 (99.59%)
> revision size : 3197528
>    full      :  744577 (23.29%)
>    deltas    : 2452951 (76.71%)
> 
> avg chain length  : 172
> compression ratio : 229
> 
> uncompressed data size (min/max/avg) : 125 / 80917 / 49156
> full revision size (min/max/avg)     : 113 / 37284 / 12206
> delta size (min/max/avg)             : 0 / 27029 / 164
> 
> deltas against prev  : 13770 (92.60%)
>    where prev = p1  : 13707     (99.54%)
>    where prev = p2  :     8     ( 0.06%)
>    other            :    55     ( 0.40%)
> deltas against p1    :  1097 ( 7.38%)
> deltas against p2    :     4 ( 0.03%)
> deltas against other :     0 ( 0.00%)
> 
> 
> 
> 
> == Evaluating performance: ==
> 
> Servers serving general-delta repositories will reorder changesets on
> the fly to improve compression and streaming performance over the
> existing wire protocol. So we'd like to see three results:
> 
> - cloning from old to old (baseline):
> $ hg clone --time -U --pull proj-normal proj-normal-normal
> requesting all changes
> adding changesets
> adding manifests
> adding file changes                                                             
> added 14938 changesets with 29187 changes to 2054 files                         
> Time: real 10.420 secs (user 10.060+0.000 sys 0.340+0.000)
> 
> - cloning from new to old
> $  hg clone --time -U --pull proj-gdelta proj-gdelta-normal
> requesting all changes
> adding changesets
> adding manifests
> adding file changes                                                             
> added 14938 changesets with 29187 changes to 2054 files                         
> Time: real 13.030 secs (user 12.560+0.000 sys 0.410+0.000)
> 
> - cloning from new to new
> $  hg clone --time -U --pull --config format.generaldelta=1 proj-gdelta proj-gdelta-gdelta
> requesting all changes
> adding changesets
> adding manifests
> adding file changes                                                             
> added 14938 changesets with 29187 changes to 2054 files                         
> Time: real 16.620 secs (user 16.160+0.000 sys 0.390+0.000)
> 
> And then, compare the sizes again:
> 
> $ du -sh proj-normal-normal proj-gdelta-normal proj-gdelta-gdelta
> 31M	hgnn
> 27M	hggn
> 26M	hggg
> 
> == Evaluating window size ==
> 
> Tweaking the compression window size can potentially have a large impact
> on resulting size, but right now tuning it requires hacking the source.
> Around line 1051 in mercurial/revlog.py is the following magic constant:
> 
>        if d is None or dist > textlen * 2:
>            text = buildtext()
>            data = compress(text)
> 
> Changing that "* 2" to values between 3 and 10 will change the
> compression/performance trade-off and may result in large improvements
> in generaldelta compression in repositories with lots of branching. If
> that's you, give it a try and tell us what you find. Again, the output
> of 'hg debugrevlog -m' may be valuable.
> 
> -- 
> Mathematics is the supreme nostalgia of our time.
> 
> 
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial




More information about the Mercurial mailing list