Mercurial and very large files...
Matt Mackall
mpm at selenic.com
Tue Mar 27 10:21:58 UTC 2007
On Tue, Mar 27, 2007 at 11:38:42AM +0200, Marcin Kasperski wrote:
> I tried loading 1GB file to the mercurial repo. Killed the
> process when my machine started to swap extensively.
> Then noticed the following sentence in FAQ:
>
> Mercurial currently assumes that single files, indices,
> and manifests can fit in memory for efficiency.
>
> How deep is this restriction? Is it inherent for mercurial
> implementation, or something likely to be changed in
> the future?
a) diff algorithms can only work efficiently when contents of both
revisions fit in memory
b) our diff algorithm (and basically all modern textual diff
algorithms) are O(n^2) worst-case, so it'll fare badly on very
large files
There's not much that can be done about (a) aside from falling back to
non-delta storage for large files.
And meanwhile, everything else assumes files can be read into memory
in a single chunk, because the delta storage already requires it.
As for (b), we could fall back to a less powerful O(n) diff algorithm
for large files, but you'll still be bound by memory.
> Also: are there any suggestions, what is the max file
> size mercurial can be expected to work efficiently with?
> Of course this depends on the RAM size and CPU, but
> is the dependency linear, or worse?
I think it's probably around free memory / 4 or so. Which means on a
32-bit box, you're out of luck.
> PS Interesting that Git seems to be almost equivalent
> to mercurial here. Experimenting, I just tried commiting 450MB
> file. hg allocated ~900MB while commiting (I killed this process
> after some time). git-update-index allocated ~900MB while
> execting git add.
>
> Git does not seem significantly better. git-update-index
> just allocated 900MB while adding 600MB file...
Yep. And they'll both probably get -worse- when you add the second
revision.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list