Mercurial problem with large binary changesets
Matt Mackall
mpm at selenic.com
Wed May 30 14:27:14 UTC 2007
On Wed, May 30, 2007 at 08:01:01AM -0400, Francois-Denis Gonthier wrote:
> ----- KRYPTIVA PACKAGED MESSAGE -----
> PACKAGING TYPE: SIGNED
>
> > It's possible that if your template contains a huge number of LF
> > characters, memory usage could be especially ugly. The delta
> > algorithm is "line-based", even for "binary" files. Can you count
> > them?
>
> wc -l reports 173312 lines.
Ok, that's in the expected range.
> > Alternately, you can hack mercurial/bdiff.c to report how big its
> > various mallocs are. Line 78 is the most likely culprit.
>
> Allocations in bdiff.c:78 does not correlate well with the behavior I
> observe. There are a few medium-sized allocation there:
>
> 2007-05-30 11:41:58.195043 - bdiff: 107560
> 2007-05-30 11:42:00.009550 - bdiff: 141280
> 2007-05-30 11:42:02.212016 - bdiff: 3026560
There are two more mallocs in there. But this is probably not where
the problem liest.
> I can't interpret if this is excessive or not.
No, it's quite reasonable.
> I see the problem might be related to decompression. From the stack trace:
>
> File "/var/lib/python-support/python2.4/mercurial/revlog.py", line 66,
> in decompress
> if t == 'x': return zlib.decompress(bin)
>
> Is there a way to debug that?
Hmm, interesting. I missed this stack trace the first time around.
Zlib's memory usage should be quite reasonable. Worst case, I'd expect
it to use 2x the uncompressed file size.
remote: File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 1183, in addgroup
remote: text = self.revision(chain)
remote: File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 919, in revision
remote: bins.append(self.chunk(r, df=df))
remote: File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 875, in chunk
remote: return decompress(self.chunkcache[1][offset:offset + length])
remote: File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 66, in decompress
remote: if t == 'x': return zlib.decompress(bin)
remote: MemoryError
Can you turn that into something like:
try:
if t == 'x': return zlib.decompress(bin)
except:
file("/tmp/busted-bin", "w").write(bin)
raise
This will dump a copy of troublesome compressed chunk to disk. Then we
can try to decompress it manually:
>>> import zlib
>>> b = file("/tmp/busted-bin").read()
>>> a = zlib.decompress(b)
>>> len(a)
688890
>>> a[:50]
'[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,'
>>>
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list