Call for testing: generaldelta
Augie Fackler
durin42 at gmail.com
Tue Jul 26 17:38:39 UTC 2011
On Jul 25, 2011, at 11:40 AM, Matt Mackall wrote:
> Mercurial 1.9 includes an experimental feature called 'generaldelta'
> that should improve compression in repositories with lots of branching.
> Please help us test it so that we can work towards making it the
> default.
>
> Things we'd like to evaluate:
>
> - how significant the compression improvements are
> - how much overhead there is when communicating with older clients
> - (advanced) what the best trade-off compression window size is
I don't have the repo handy (cloning it now), but http://hg.adium.im/adium/ got bigger when I ran it through generaldelta.
>
> == Evaluating compression: ==
>
> Do two clones:
>
> $ hg clone -U --pull proj proj-normal
> $ hg clone -U --pull --config format.generaldelta=1 proj proj-gdelta
>
> Then compare their sizes:
>
> (Unix) $ du -sh proj-normal proj-gdelta
> 31M proj-normal
> 26M proj-gdelta
>
> And compare their manifest sizes:
> $ ls -l proj-normal/.hg/store/00manifest.*
> -rw-r--r-- 1 1000 1000 6043911 Jul 25 13:18 hgn/.hg/store/00manifest.d
> -rw-r--r-- 1 1000 1000 955648 Jul 25 13:18 hgn/.hg/store/00manifest.i
> $ ls -l proj-gdelta/.hg/store/00manifest.*
> -rw-r--r-- 1 1000 1000 3197528 Jul 25 13:15 hgg/.hg/store/00manifest.d
> -rw-r--r-- 1 1000 1000 955648 Jul 25 13:15 hgg/.hg/store/00manifest.i
>
> This data may also be valuable:
>
> $ hg debugrevlog -m
> format : 1
> flags : generaldelta
>
> revisions : 14932
> merges : 1763 (11.81%)
> normal : 13169 (88.19%)
> revisions : 14932
> full : 61 ( 0.41%)
> deltas : 14871 (99.59%)
> revision size : 3197528
> full : 744577 (23.29%)
> deltas : 2452951 (76.71%)
>
> avg chain length : 172
> compression ratio : 229
>
> uncompressed data size (min/max/avg) : 125 / 80917 / 49156
> full revision size (min/max/avg) : 113 / 37284 / 12206
> delta size (min/max/avg) : 0 / 27029 / 164
>
> deltas against prev : 13770 (92.60%)
> where prev = p1 : 13707 (99.54%)
> where prev = p2 : 8 ( 0.06%)
> other : 55 ( 0.40%)
> deltas against p1 : 1097 ( 7.38%)
> deltas against p2 : 4 ( 0.03%)
> deltas against other : 0 ( 0.00%)
>
>
>
>
> == Evaluating performance: ==
>
> Servers serving general-delta repositories will reorder changesets on
> the fly to improve compression and streaming performance over the
> existing wire protocol. So we'd like to see three results:
>
> - cloning from old to old (baseline):
> $ hg clone --time -U --pull proj-normal proj-normal-normal
> requesting all changes
> adding changesets
> adding manifests
> adding file changes
> added 14938 changesets with 29187 changes to 2054 files
> Time: real 10.420 secs (user 10.060+0.000 sys 0.340+0.000)
>
> - cloning from new to old
> $ hg clone --time -U --pull proj-gdelta proj-gdelta-normal
> requesting all changes
> adding changesets
> adding manifests
> adding file changes
> added 14938 changesets with 29187 changes to 2054 files
> Time: real 13.030 secs (user 12.560+0.000 sys 0.410+0.000)
>
> - cloning from new to new
> $ hg clone --time -U --pull --config format.generaldelta=1 proj-gdelta proj-gdelta-gdelta
> requesting all changes
> adding changesets
> adding manifests
> adding file changes
> added 14938 changesets with 29187 changes to 2054 files
> Time: real 16.620 secs (user 16.160+0.000 sys 0.390+0.000)
>
> And then, compare the sizes again:
>
> $ du -sh proj-normal-normal proj-gdelta-normal proj-gdelta-gdelta
> 31M hgnn
> 27M hggn
> 26M hggg
>
> == Evaluating window size ==
>
> Tweaking the compression window size can potentially have a large impact
> on resulting size, but right now tuning it requires hacking the source.
> Around line 1051 in mercurial/revlog.py is the following magic constant:
>
> if d is None or dist > textlen * 2:
> text = buildtext()
> data = compress(text)
>
> Changing that "* 2" to values between 3 and 10 will change the
> compression/performance trade-off and may result in large improvements
> in generaldelta compression in repositories with lots of branching. If
> that's you, give it a try and tell us what you find. Again, the output
> of 'hg debugrevlog -m' may be valuable.
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
More information about the Mercurial
mailing list