Are revlog diff calculated as "text" ALWAYS?

Matt Mackall mpm at selenic.com
Tue May 13 20:43:54 UTC 2014


On Tue, 2014-05-13 at 20:20 +0000, Kastner Masilko, Friedrich wrote:
> > From: mercurial-bounces at selenic.com [mailto:mercurial-bounces at selenic.com] On Behalf Of Matt Mackall
> > 
> > ..so if you can find a real-world 200k file that doesn't have enough
> > 0x0a bytes in it to be digestible, I'll be amazed.
> 
> Hm. I have seen serialized XML files that exceeded that size while
> having no line-break in it. Although these certainly compress better
> than a random binary stream, it still shows a short-coming in the
> algorithm as I see it.

Ironically.. that's not even a binary file. But then it also might be
what's in a docx file.

> Would it be possible to artificially break "long lines" at the e.g. 4k
> mark to ease that situation? IMHO, that shouldn't make a big
> difference in performance, won't interfere with standard-usage, and
> should still be compatible with the binary patch implementation.

If someone wants to experiment with this, the relevant code should all
be in bdiff.c:splitlines() (find the two places that mention '\n').

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial mailing list