Are revlog diff calculated as "text" ALWAYS?

Jesus Cea jcea at jcea.es
Tue May 13 22:59:50 UTC 2014


On 13/05/14 22:20, Kastner Masilko, Friedrich wrote:
> Hm. I have seen serialized XML files that exceeded that size while
> having no line-break in it. Although these certainly compress better
> than a random binary stream, it still shows a short-coming in the
> algorithm as I see it.

That is exactly my use-case.

> Would it be possible to artificially break "long lines" at the e.g.
> 4k mark to ease that situation? IMHO, that shouldn't make a big
> difference in performance, won't interfere with standard-usage, and
> should still be compatible with the binary patch implementation.

You don't want fixed size chunks (unless your source files are database
files). Any insertion in the middle of the file will change all blocks
after it.

You want to find a suitable "linefeed" mark that doesn't change a lot
when you do small changes to the file. For instance, for XML files, the
"line ending" could be the character ">".

I wonder if bdiff implementation demands a sequence of "lines"/blocks as
input, or could manage whole file blocks, like "xdelta"
<http://xdelta.org/>.

Another issue would be when to use current diff or binary diff. Current
implementation detects "binary" when byte '\0' is present in the input,
but the described XML doesn't qualify. Maybe something like "'\0'
present or 'size/number of linefeeds' exceeds a threshold". But sounds
quite magical to me.

An additional option would be to do real binary XDELTA-like diff ALWAYS,
but then I wonder how this would interact with "hg diff" when the file
is actually text.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 538 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20140514/f1be00b9/attachment.asc>


More information about the Mercurial mailing list