hg pull runs out of memory

Patrick Mézard patrick at mezard.eu
Fri Apr 27 09:25:02 UTC 2012


Le 27/04/12 11:07, Isaac Jurado a écrit :
> On Fri, Apr 27, 2012 at 10:16 AM, Hertroys A.
> <alban.hertroys at apollovredestein.com> wrote:
>>
>>>>> Because Mercurial is designed for dealing with _source code_
>>>>> quickly.  And it's massively more efficient to deal with small
>>>>> files typical of source code by reading, writing, and calculating
>>>>> deltas on them in memory.
>>>>
>>>> Of course, but isn't it kind of dumb to attempt the same with a
>>>> large binary file?
>>>
>>> The engineering effort to change the situation is huge and the demand
>>> is small.
>>
>> [...]
>>
>> For binary data you would compare some file-properties first (creation
>> date, size and a CRC check seem a reasonable starting point - although
>> the first two will differ between clones, I realise) and if they're
>> different you create a new version of the file. That's how many source
>> control systems do this.
>>
>> Yes, that way your repository grows fast if you change your binary
>> files a lot and yes, implementing a binary diff that could prevent
>> that growth would be a huge effort. ISTR reading about some open
>> source tools capable of binary diffs, there's probably no need to
>> duplicate the effort.
>>
>> It's easy to control which binary files need to stay under version
>> control if the only effect they have is to increase your repository's
>> size. You can see that coming, as it happens gradually, while more
>> disk space is relatively easy to obtain (although of course still a
>> pain if all your clones for all your developers grow beyond reasonable
>> sizes).  Compared to the current situation where a large file can
>> suddenly drive you past the memory allocation limits of 32-bit systems
>> (which is still the majority), I think that's an improvement.
>>
>> [...]
>>
>> I think this could be abused as a DOS vulnerability. It only requires
>> one person pushing a largish binary file to the repo, and every
>> developer with a clone on a 32-bit system is going to run into
>> problems and it's kind of hard to fix the repo. Just saying...
> 
> I don't get it.  Isn't this issue what the largefiles extension is
> designed for?
> 
>     http://mercurial.selenic.com/wiki/LargefilesExtension

I see largefiles more as a solution to avoid having to retrieve a huge data store made of binary files. The problem here is even if you are willing to download this history, you cannot checkout the files because delta reconstruction (and generation) is done in-memory.

--
Patrick Mézard



More information about the Mercurial mailing list