Are revlog diff calculated as "text" ALWAYS?
Matt Mackall
mpm at selenic.com
Tue May 13 20:43:54 UTC 2014
On Tue, 2014-05-13 at 20:20 +0000, Kastner Masilko, Friedrich wrote:
> > From: mercurial-bounces at selenic.com [mailto:mercurial-bounces at selenic.com] On Behalf Of Matt Mackall
> >
> > ..so if you can find a real-world 200k file that doesn't have enough
> > 0x0a bytes in it to be digestible, I'll be amazed.
>
> Hm. I have seen serialized XML files that exceeded that size while
> having no line-break in it. Although these certainly compress better
> than a random binary stream, it still shows a short-coming in the
> algorithm as I see it.
Ironically.. that's not even a binary file. But then it also might be
what's in a docx file.
> Would it be possible to artificially break "long lines" at the e.g. 4k
> mark to ease that situation? IMHO, that shouldn't make a big
> difference in performance, won't interfere with standard-usage, and
> should still be compatible with the binary patch implementation.
If someone wants to experiment with this, the relevant code should all
be in bdiff.c:splitlines() (find the two places that mention '\n').
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list