On compressing revlogs
Jesper Schmidt
schmiidt at gmail.com
Mon Jun 4 23:52:51 UTC 2012
On Mon, 04 Jun 2012 23:07:27 +0200, Bryan O'Sullivan <bos at serpentine.com>
wrote:
> Lately, I've come to target zlib as a performance bottleneck for reading
> data from revlogs. I threw together a quick hack this morning to use the
> snappy compression algorithm instead. Here's what I've found.
>
> Snappy compression is up to 15x faster than zlib (haven't seen it be less
> than 8x faster), while decompression is up to 4x faster (haven't seen it
> less than 2x faster). Of course there's a tradeoff: poorer compression
> ratios, about 1.5x larger than zlib in my tests.
I recently benchmarked a couple of different compression algorithms. I
stumbled across one called lz4 (http://code.google.com/p/lz4/), which
consistently outperformed Snappy both with respect to compression ratio
and compression/decompression speed. Below are some results from a test
run I just did on the Mercurial source tree (in-memory).
ratio comp decomp
lz4 13.1 MB -> 5.0 MB (38.0%) 313.4 MB/s 912.8 MB/s
lz4hc 13.1 MB -> 3.6 MB (27.1%) 20.7 MB/s 975.3 MB/s
snappy 13.1 MB -> 5.2 MB (39.2%) 149.4 MB/s 579.5 MB/s
zlib<1> 13.1 MB -> 3.8 MB (28.5%) 47.5 MB/s 181.3 MB/s
zlib<2> 13.1 MB -> 3.6 MB (27.2%) 42.5 MB/s 190.6 MB/s
zlib<3> 13.1 MB -> 3.4 MB (26.2%) 35.4 MB/s 191.5 MB/s
zlib<4> 13.1 MB -> 3.2 MB (24.6%) 30.8 MB/s 192.2 MB/s
zlib<5> 13.1 MB -> 3.1 MB (23.7%) 22.2 MB/s 199.6 MB/s
zlib<6> 13.1 MB -> 3.1 MB (23.3%) 16.8 MB/s 195.9 MB/s
zlib<-1> 13.1 MB -> 3.1 MB (23.3%) 16.9 MB/s 205.2 MB/s
bzip2(1) 13.1 MB -> 2.8 MB (21.4%) 8.6 MB/s 36.2 MB/s
lz4hc is a high compression variant of lz4, which might provide a better
tradeoff in your case (write once, read many).
--
Jesper
More information about the Mercurial-devel
mailing list