Filter for uncompressed storage of zipped document formats like docx (http://stackoverflow.com/questions/3298525/version-control-for-docx-and-pdf)
Martin Geisler
mg at aragost.com
Thu May 5 08:11:38 UTC 2011
Andreas Gobell <andreasgobell at gmx.de> writes:
> Dear Mercurial team,
>
> I am setting up Mercurial as my Version Control System. In my case it
> is not only meant to manage source code but also Microsoft Word
> documents in the docx format and some binary files.
>
> I already wrote some scripts to handle diffing and merging of docx
> files in Word. Another goal was to improve the delta compression in
> the repository. To achieve this I first had tried putting directories
> containing the extracted docx contents under version control. This
> worked fine for the repository but the usage was cumbersome because of
> the necessary conversion between the directories and the docx files. I
> then stumbled upon the thread
> http://stackoverflow.com/questions/3298525/version-control-for-docx-and-pdf
> where Martin Geisler mentions Mercurial's Filter System. This seemed a
> good solution as it is completely transparent to the user.
>
> As Martin stated that he is interested in a solution to this problem
> and I haven't found an extension on the internet I am sending the
> filter extension that I've written. I've done some tests and compared
> the space required for storing standard compressed docx files, docx
> with no compression created manually before a commit and docx
> processed by my filter and the results show clear space savings for
> the filter version (and of course the manually uncompressed docx). I
> also tested odt files created in LibreOffice with the filter and they
> work as well.
>
> I am new to Mercurial and I haven't written Python for a few years so
> I am would be very glad to hear about improvements and comments.
Great work, the extension looks good!
You should definitely publish it somewhere more permanently: put it in a
public repository (create one on bitbucket.org or code.google.com if you
don't have one already) and create a wiki page for it:
http://mercurial.selenic.com/wiki/DoczipExtension?action=edit&template=ExtensionTemplate
Then add a link to the new page here:
http://mercurial.selenic.com/wiki/UsingExtensions
Also, send this to the TortoiseHg guys -- maybe Steve will bundle the
extension with TortoiseHg since it seems particularly useful in
Word-heavy (e.g., Windows) environments.
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/
More information about the Mercurial
mailing list