Mercurial problem with large binary changesets

Matt Mackall mpm at selenic.com
Wed May 30 14:27:14 UTC 2007


On Wed, May 30, 2007 at 08:01:01AM -0400, Francois-Denis Gonthier wrote:
> ----- KRYPTIVA PACKAGED MESSAGE -----
> PACKAGING TYPE: SIGNED
> 
>  > It's possible that if your template contains a huge number of LF
>  > characters, memory usage could be especially ugly. The delta
>  > algorithm is "line-based", even for "binary" files. Can you count
>  > them?
> 
> wc -l reports 173312 lines.

Ok, that's in the expected range.

>  > Alternately, you can hack mercurial/bdiff.c to report how big its
>  > various mallocs are. Line 78 is the most likely culprit.
> 
> Allocations in bdiff.c:78 does not correlate well with the behavior I
> observe.  There are a few medium-sized allocation there:
> 
> 2007-05-30 11:41:58.195043 - bdiff: 107560
> 2007-05-30 11:42:00.009550 - bdiff: 141280
> 2007-05-30 11:42:02.212016 - bdiff: 3026560

There are two more mallocs in there. But this is probably not where
the problem liest.

> I can't interpret if this is excessive or not.

No, it's quite reasonable.
 
> I see the problem might be related to decompression.  From the stack trace:
> 
> File "/var/lib/python-support/python2.4/mercurial/revlog.py", line 66,
> in decompress
>       if t == 'x': return zlib.decompress(bin)
> 
> Is there a way to debug that?

Hmm, interesting. I missed this stack trace the first time around.
Zlib's memory usage should be quite reasonable. Worst case, I'd expect
it to use 2x the uncompressed file size.

remote:   File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 1183, in addgroup
remote:     text = self.revision(chain)
remote:   File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 919, in revision
remote:     bins.append(self.chunk(r, df=df))
remote:   File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 875, in chunk
remote:     return decompress(self.chunkcache[1][offset:offset + length])
remote:   File "/var/lib/python-support/python2.4/mercurial/revlog.py",
line 66, in decompress
remote:     if t == 'x': return zlib.decompress(bin)
remote: MemoryError

Can you turn that into something like:

   try:
      if t == 'x': return zlib.decompress(bin)
   except:
      file("/tmp/busted-bin", "w").write(bin)
      raise

This will dump a copy of troublesome compressed chunk to disk. Then we
can try to decompress it manually:

>>> import zlib
>>> b = file("/tmp/busted-bin").read()
>>> a = zlib.decompress(b)
>>> len(a)
688890
>>> a[:50]
'[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,'
>>> 

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial mailing list