Are revlog diff calculated as "text" ALWAYS?

Jesus Cea jcea at jcea.es
Tue May 13 22:47:07 UTC 2014


On 13/05/14 21:21, Matt Mackall wrote:
> That's a rather unusual binary file, no? Indeed, it will completely
> defeat our delta algorithm.

This file in explicitly build to show the problem and get your attention.

> But a binary that contains any of the following will have plenty of
> linefeed bytes (0x0a):
> 
> - compressed data of any sort
> - raw binary data with non-contrived distributions
> - machine code
> - embedded text
> 
> ..so if you can find a real-world 200k file that doesn't have enough
> 0x0a bytes in it to be digestible, I'll be amazed.

a) Take an odt (OpenDocument/OpenOffice/LibreOffice).

b) Depack the document with ZIP. It is just a ZIP file. Really.

c) Check "content.xml".

It is a HUGE document with no or very few linefeeds on it.

I detected this problem when using hg-zipdoc extension
<http://mercurial.selenic.com/wiki/ZipdocExtension>, specifically
designed to help with ODT files (any ZIP file, actually), and I was
getting far worse deltas that expected. Exploring this fact I discovered
this issue.

So yes, the file I proposed is specifically engineered to be patological
and not represents a real life example. But yes, there are real life
examples that exhibit the same problem. Easy to find and painful to manage.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 538 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20140514/4a3572dd/attachment.asc>


More information about the Mercurial mailing list