Mercurial and very large files...
Marcin Kasperski
Marcin.Kasperski at softax.com.pl
Tue Mar 27 11:32:15 UTC 2007
Thank you for your comments. Seems the restriction is rather
deep.
I will probably workaround it by splitting large files into
smaller parts (in my case it is possible). Nevertheless, I think
addressing the issue at some point in the future could make
sense, looks like there are no revision-control tools handling
large files well at the moment. And, leaving apart my case,
there are tasks like sound or video editing...
The rest of this email is just loose discussion, feel free to
ignore.
> a) diff algorithms can only work efficiently when contents of
> both revisions fit in memory
> (...)
> There's not much that can be done about (a) aside from falling
> back to non-delta storage for large files.
I am probably a bit naive, but large files not necessarily mean
large changes. Even stupid heuristics like skipping common
prefix of two files and using normal algorithm for the rest
could in many cases do fairly well. More generally, I believe
that assuming limited change size (in most cases this is true)
one could think about implementation suited for this usage
scenario (if we are not able to find common parts in - say -
10MB windows, then the new change can be treated as total
replacement).
> And meanwhile, everything else assumes files can be read into
> memory in a single chunk, because the delta storage already
> requires it.
Another stupid idea: internally, for the sake of algorithms,
treat the large file as many small files (for instance bytes
1-50.000.000, bytes 50.000.001-100.000.000, etc). Just silently
split them before adding/committing and join when updating.
Then, the changes like appending something or fixed size
replacement will be handled perfectly. Changes like
inserting/deleting something will result in similar size changes
among many files (if I add 3 bytes at position 3437, this
algorithm will add/remove 3 bytes in every successive file), but
nevertheless it would be significantly better than failing at
all.
Best regards
More information about the Mercurial
mailing list