xrevlog: experimental reimplementation of revlog in C
Matt Mackall
mpm at selenic.com
Mon Nov 15 19:45:40 UTC 2010
On Sun, 2010-11-14 at 12:23 +0100, Adrian Buehlmann wrote:
> > On our main repo at
> > work, that's two 112,000-element containers every time you run (say)
> > "hg tip".
>
> That's 112,000 x 64 bytes = 6.7 MB index file which seems to be parsed
> quite quickly by the C code in parsers.c (I guess that's function
> parse_index there).
Not really. The index itself is 112k Python tuples. Each tuple contains
7 Python integers and one 20 byte Python string. Each Python integer is
8 bytes. And those 20 byte strings when wrapped in a Python object and
allocated individually through Python's allocator, may take 64 bytes or
more.
In fact, simply measuring with top and
b = [(0, 0, 0, 0, 0, 0, 0, "%020x" % i) for i in xrange(1000000)]
suggests that each index entry is taking about 195 bytes. That breaks
down to something like:
8 bytes per index entry pointer
64 bytes per tuple
7 * 8 bytes per integer
64 bytes per hash
---
192 bytes per entry + 9 memory allocations
There's a similar story with the hash side of things. If we read the
index into memory and only parse it on demand, we can be significantly
smaller and faster.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list