hg pull runs out of memory
Hertroys A.
alban.hertroys at apollovredestein.com
Fri Apr 27 08:16:57 UTC 2012
> > > Because Mercurial is designed for dealing with _source code_ quickly.
> > > And it's massively more efficient to deal with small files typical of
> > > source code by reading, writing, and calculating deltas on them in
> > > memory.
> >
> > Of course, but isn't it kind of dumb to attempt the same with a large
> > binary file?
>
> The engineering effort to change the situation is huge and the demand is
> small.
Huge? You're the expert in this matter of course, but that's quite contrary to what I expected.
For all I know the difference would be to add a check for the file's type (plain text or binary) before comparing it to an older version and decide which code path to take from there. For text data you'd (of course) take the current code path.
For binary data you would compare some file-properties first (creation date, size and a CRC check seem a reasonable starting point - although the first two will differ between clones, I realise) and if they're different you create a new version of the file. That's how many source control systems do this.
Yes, that way your repository grows fast if you change your binary files a lot and yes, implementing a binary diff that could prevent that growth would be a huge effort. ISTR reading about some open source tools capable of binary diffs, there's probably no need to duplicate the effort.
It's easy to control which binary files need to stay under version control if the only effect they have is to increase your repository's size. You can see that coming, as it happens gradually, while more disk space is relatively easy to obtain (although of course still a pain if all your clones for all your developers grow beyond reasonable sizes).
Compared to the current situation where a large file can suddenly drive you past the memory allocation limits of 32-bit systems (which is still the majority), I think that's an improvement.
> I'll consider fixing it for about $100k. That same $100k of
> course will buy you lots of 64-bit systems.
I know the dollar is a bit low these days, but in euro's that's about 5 to 6 years of decent pay! Seems a bit excessive.
It's possible my employer would be willing to pay if it involved a reasonable amount, but $100k is not going to happen. We do rely on relatively old hardware here (and by consequence software too). This is a factory, with process control systems (still can't beat VMS at reliability) - for example, we only recently replaced IE6 and Office 2000. That's not free to maintain (or cheap).
> Until then, there are more pressing matters.
I think this could be abused as a DOS vulnerability. It only requires one person pushing a largish binary file to the repo, and every developer with a clone on a 32-bit system is going to run into problems and it's kind of hard to fix the repo. Just saying...
alban.hertroys at apollovredestein.com
T:
Apollo Vredestein B.V. - P.O. Box 27 - 7500 AA Enschede - The Netherlands - Chamber of Commerce number 34223268 - http://www.apollovredestein.com
The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. You are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein B.V. rules out any and every liability resulting from this or any other electronic transmission.
--------------------------------------------------------------------------
More information about the Mercurial
mailing list