Filter for uncompressed storage of zipped document formats like docx (http://stackoverflow.com/questions/3298525/version-control-for-docx-and-pdf)

Didly didlybom at gmail.com
Thu May 5 10:00:34 UTC 2011


On Thu, May 5, 2011 at 10:11 AM, Martin Geisler <mg at aragost.com> wrote:
> Andreas Gobell <andreasgobell at gmx.de> writes:
>
>> Dear Mercurial team,
>>
>> I am setting up Mercurial as my Version Control System. In my case it
>> is not only meant to manage source code but also Microsoft Word
>> documents in the docx format and some binary files.
>>
>> I already wrote some scripts to handle diffing and merging of docx
>> files in Word. Another goal was to improve the delta compression in
>> the repository. To achieve this I first had tried putting directories
>> containing the extracted docx contents under version control. This
>> worked fine for the repository but the usage was cumbersome because of
>> the necessary conversion between the directories and the docx files. I
>> then stumbled upon the thread
>> http://stackoverflow.com/questions/3298525/version-control-for-docx-and-pdf
>> where Martin Geisler mentions Mercurial's Filter System. This seemed a
>> good solution as it is completely transparent to the user.
>>
>> As Martin stated that he is interested in a solution to this problem
>> and I haven't found an extension on the internet I am sending the
>> filter extension that I've written. I've done some tests and compared
>> the space required for storing standard compressed docx files, docx
>> with no compression created manually before a commit and docx
>> processed by my filter and the results show clear space savings for
>> the filter version (and of course the manually uncompressed docx). I
>> also tested odt files created in LibreOffice with the filter and they
>> work as well.
>>
>> I am new to Mercurial and I haven't written Python for a few years so
>> I am would be very glad to hear about improvements and comments.
>
> Great work, the extension looks good!
>
> You should definitely publish it somewhere more permanently: put it in a
> public repository (create one on bitbucket.org or code.google.com if you
> don't have one already) and create a wiki page for it:
>
>  http://mercurial.selenic.com/wiki/DoczipExtension?action=edit&template=ExtensionTemplate
>
> Then add a link to the new page here:
>
>  http://mercurial.selenic.com/wiki/UsingExtensions
>
> Also, send this to the TortoiseHg guys -- maybe Steve will bundle the
> extension with TortoiseHg since it seems particularly useful in
> Word-heavy (e.g., Windows) environments.
>
> --
> Martin Geisler
>

Please do send it (you can send an email to thg-dev at googlegroups.com).
I think this would be really useful for a lot of users!

Does your filter work for regular zip files too?

BTW, how does mercurial's storage compression compare to the zip
format compression of text files?

Angel



More information about the Mercurial mailing list