hg pull runs out of memory

Isaac Jurado diptongo at gmail.com
Fri Apr 27 09:07:25 UTC 2012


On Fri, Apr 27, 2012 at 10:16 AM, Hertroys A.
<alban.hertroys at apollovredestein.com> wrote:
>
>> >> Because Mercurial is designed for dealing with _source code_
>> >> quickly.  And it's massively more efficient to deal with small
>> >> files typical of source code by reading, writing, and calculating
>> >> deltas on them in memory.
>> >
>> > Of course, but isn't it kind of dumb to attempt the same with a
>> > large binary file?
>>
>> The engineering effort to change the situation is huge and the demand
>> is small.
>
> [...]
>
> For binary data you would compare some file-properties first (creation
> date, size and a CRC check seem a reasonable starting point - although
> the first two will differ between clones, I realise) and if they're
> different you create a new version of the file. That's how many source
> control systems do this.
>
> Yes, that way your repository grows fast if you change your binary
> files a lot and yes, implementing a binary diff that could prevent
> that growth would be a huge effort. ISTR reading about some open
> source tools capable of binary diffs, there's probably no need to
> duplicate the effort.
>
> It's easy to control which binary files need to stay under version
> control if the only effect they have is to increase your repository's
> size. You can see that coming, as it happens gradually, while more
> disk space is relatively easy to obtain (although of course still a
> pain if all your clones for all your developers grow beyond reasonable
> sizes).  Compared to the current situation where a large file can
> suddenly drive you past the memory allocation limits of 32-bit systems
> (which is still the majority), I think that's an improvement.
>
> [...]
>
> I think this could be abused as a DOS vulnerability. It only requires
> one person pushing a largish binary file to the repo, and every
> developer with a clone on a 32-bit system is going to run into
> problems and it's kind of hard to fix the repo. Just saying...

I don't get it.  Isn't this issue what the largefiles extension is
designed for?

    http://mercurial.selenic.com/wiki/LargefilesExtension

-- 
Isaac Jurado

"The noblest pleasure is the joy of understanding"
Leonardo da Vinci



More information about the Mercurial mailing list