Mercurial and very large files...

Matt Mackall mpm at selenic.com
Tue Mar 27 10:21:58 UTC 2007


On Tue, Mar 27, 2007 at 11:38:42AM +0200, Marcin Kasperski wrote:
> I tried loading 1GB file to the mercurial repo. Killed the 
> process when my machine started to swap extensively.
> Then noticed the following sentence in FAQ:
> 
>    Mercurial currently assumes that single files, indices, 
>    and manifests can fit in memory for efficiency.
> 
> How deep is this restriction? Is it inherent for mercurial
> implementation, or something likely to be changed in
> the future?

a) diff algorithms can only work efficiently when contents of both
   revisions fit in memory
b) our diff algorithm (and basically all modern textual diff
   algorithms) are O(n^2) worst-case, so it'll fare badly on very
   large files

There's not much that can be done about (a) aside from falling back to
non-delta storage for large files.

And meanwhile, everything else assumes files can be read into memory
in a single chunk, because the delta storage already requires it.

As for (b), we could fall back to a less powerful O(n) diff algorithm
for large files, but you'll still be bound by memory.

> Also: are there any suggestions, what is the max file
> size mercurial can be expected to work efficiently with?
> Of course this depends on the RAM size and CPU, but 
> is the dependency linear, or worse?

I think it's probably around free memory / 4 or so. Which means on a
32-bit box, you're out of luck.
 
> PS Interesting that Git seems to be almost equivalent 
> to mercurial here. Experimenting, I just tried commiting 450MB 
> file. hg allocated ~900MB while commiting (I killed this process
> after some time). git-update-index allocated ~900MB while 
> execting git add. 
> 
> Git does not seem significantly better. git-update-index 
> just allocated 900MB while adding 600MB file...

Yep. And they'll both probably get -worse- when you add the second
revision.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial mailing list