Are revlog diff calculated as "text" ALWAYS?
Jesus Cea
jcea at jcea.es
Tue May 13 22:47:07 UTC 2014
On 13/05/14 21:21, Matt Mackall wrote:
> That's a rather unusual binary file, no? Indeed, it will completely
> defeat our delta algorithm.
This file in explicitly build to show the problem and get your attention.
> But a binary that contains any of the following will have plenty of
> linefeed bytes (0x0a):
>
> - compressed data of any sort
> - raw binary data with non-contrived distributions
> - machine code
> - embedded text
>
> ..so if you can find a real-world 200k file that doesn't have enough
> 0x0a bytes in it to be digestible, I'll be amazed.
a) Take an odt (OpenDocument/OpenOffice/LibreOffice).
b) Depack the document with ZIP. It is just a ZIP file. Really.
c) Check "content.xml".
It is a HUGE document with no or very few linefeeds on it.
I detected this problem when using hg-zipdoc extension
<http://mercurial.selenic.com/wiki/ZipdocExtension>, specifically
designed to help with ODT files (any ZIP file, actually), and I was
getting far worse deltas that expected. Exploring this fact I discovered
this issue.
So yes, the file I proposed is specifically engineered to be patological
and not represents a real life example. But yes, there are real life
examples that exhibit the same problem. Easy to find and painful to manage.
--
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 538 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20140514/4a3572dd/attachment.asc>
More information about the Mercurial
mailing list