Are revlog diff calculated as "text" ALWAYS?

Sean Farley sean.michael.farley at gmail.com
Tue May 13 23:52:47 UTC 2014


Jesus Cea <jcea at jcea.es> writes:

> On 13/05/14 22:20, Kastner Masilko, Friedrich wrote:
>> Hm. I have seen serialized XML files that exceeded that size while
>> having no line-break in it. Although these certainly compress better
>> than a random binary stream, it still shows a short-coming in the
>> algorithm as I see it.
>
> That is exactly my use-case.
>
>> Would it be possible to artificially break "long lines" at the e.g.
>> 4k mark to ease that situation? IMHO, that shouldn't make a big
>> difference in performance, won't interfere with standard-usage, and
>> should still be compatible with the binary patch implementation.
>
> You don't want fixed size chunks (unless your source files are database
> files). Any insertion in the middle of the file will change all blocks
> after it.
>
> You want to find a suitable "linefeed" mark that doesn't change a lot
> when you do small changes to the file. For instance, for XML files, the
> "line ending" could be the character ">".

If this was generic enough, one could implement a decent word-diff
feature, I believe.



More information about the Mercurial mailing list