Are revlog diff calculated as "text" ALWAYS?

Kastner Masilko, Friedrich kastner-masilko at at.festo.com
Wed May 14 14:38:42 UTC 2014


> From: mercurial-bounces at selenic.com [mailto:mercurial-bounces at selenic.com] On Behalf Of Jesus Cea
> 
> The problem with your suggestion, Friedrich is that your proposal is
> only useful for block oriented content like databases. For my problem
> at hand, ODT/DOCX documents, it would be not useful at all. Completely
> ineffective.

That's not true. In the _example_ you gave, the suggested block-delimiting to 4k reduces the delta from 200k to max. 4k. The same would be true in similar circumstances for deserialized XMLs, i.e. if you have XMLs that grow, or get insertions in the middle of the file, the resulting delta would not be as big as before, but reduced. Sure it would not change anything if the insertion is at the beginning of the file, but as I said: it is no generic solution, just an enhancement of the current algorithm without too much risk. Every win - and if only the skipping of the first chunk - would be better than nothing, right?

I really doubt that a general xdelta system will be as fast as the current bdiff implementation. I suspect (from the code I skimmed of that xdelta project) that it will be an order of magnitude slower. I'm pretty sure that the Mercurial maintainers will find that an unacceptable change to be done "only" to fix this special binary situation.

In a similar manner, every heuristical approach (counting markers, "guessing" binary status) will induce "magic", as you already wrote. Thus it increases risk of breakage deep down in the very backbones of the whole system. The appropriate patch would have a low chance of finding its way into productive deployments, either.

I know it is a very pragmatic approach, but have you tried block-delimiting in your case already? My quick tests on a simple Word document tells me that it reduces the store size by ~60% without much change regarding commit time. If this patch goes through due to being minimal-invasive, would that not be appreciated?

regards,
Fritz



Development Software Systems
Festo Gesellschaft m.b.H.
Linzer Strasse 227
Austria - 1140 Wien

Firmenbuch Wien
FN 38435y
UID: ATU14650108

Tel: +43(1)91075-198
Fax: 
www.festo.at

Der Inhalt dieser E-Mail und moeglicher Anhaenge sind ausschliesslich fuer den bezeichneten Adressaten bestimmt.
Jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder Weitergabe des Inhalts dieser E-Mail und
moeglicher Anhaenge durch unberechtigte Dritte ist unzulaessig. Wir bitten Sie, sich mit dem Absender der E-Mail in
Verbindung zu setzen, falls Sie nicht der Adressat dieser E-Mail sind sowie das Material von Ihrem Computer zu loeschen.

This e-mail and any attachments are confidential and intended solely for the addressee. The perusal, publication, copying or
dissemination of the contents of this e-mail by unauthorised third parties is prohibited. If you are not the intended recipient of this
e-mail, please delete it and immediately notify the sender.




More information about the Mercurial mailing list