Hg for “Source Code” only?

Steve Barnes gadgetsteve at hotmail.com
Sun Apr 20 07:03:20 UTC 2014


On 19/04/14 20:44, Scott Palmer wrote:
> On Sat, Apr 19, 2014 at 11:09 AM, Augie Fackler <raf at durin42.com> wrote:
>> Subject: Re: Hg for “Source Code” only?
>> From: raf at durin42.com
>> Date: Sat, 19 Apr 2014 11:09:49 -0400
>> CC: mercurial at selenic.com
>> To: studio-pm at hotmail.com
>>
>>
>>
>> On Apr 19, 2014, at 6:14 AM, Pietro Moras <studio-pm at hotmail.com> wrote:
>>> Mercurial defines itself precisely as a “Source Code Management System”
>>> [see:  Mercurial Command Reference] instead of, say, as a more usual, and
>>> generic, “Version Control System”. Fine.
>>> I wander if “Source Code” is here to be intended as a precise scope
>>> delimitation, that is “any collection of computer instructions (possibly
>>> with comments) written using some human-readable computer language, usually
>>> as plain text” only. Or not...
>> Mercurial works just fine on all sorts of formats, merging just tends to not
>> work as well for opaque binary formats. I've used it successfully on images
>> and word docs, albeit without ever trying to merge those formats (which I
>> suspect would end in frustration.)
>
> On Sat, Apr 19, 2014 at 1:01 PM, Pietro Moras <studio-pm at hotmail.com> wrote:
>> Your experience is in line with my current (i.e. provisional) conviction.
>> That is:
>> – Strictly speaking (that is: merging included) Mercurial is for plain text
>> files only (as any other similar system, though);
>> – Most (but not all) Mercurial features will work just fine with most file
>> formats (worth a try) ;
>> – Therefore to define Mercurial precisely as a “Source Code Management
>> System” could be considered both fair and prudent.
>>
>> Other opinions welcome. Thanks.
>> - P.M.
>>
> Mercurial treats all data as binary as I understand it.  That's for
> all operations with the exception of UI to display diffs, which is
> only done for text formats.  The only issues with non-text data would
> be related to larger files and data that doesn't diff well, like most
> compressed formats.  Mercurial will deal with them, but the storage
> (and network data transfer) will be less efficient because deltas from
> one version to the next could be the size of the entire file.  Also it
> is my understanding that Mercurial performs all data transformation
> for a single file in-memory, requiring possibly three times the size
> of the file in available memory to do what it needs to do.  The Large
> Files extension was created for cases where that would be a problem
> and also to deal with potential storage issues of files that typically
> won't have small diffs, like ZIP files for example.
>
> Note that on Windows there are tools for doing smart diffs of office
> documents, i.e. word docs.  I think the TortoiseHG distribution
> includes hooks to deal with them.
>
> Regards,
>
> Scott
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
Note that the ZipDoc <http://mercurial.selenic.com/wiki/ZipdocExtension> 
extension can deal efficiently with compressed files produced by modern 
office files such as .docx and .odt and allows you to diff the xml that 
is stored in them on just about *every* operating system, (it is 
implemented in pure python), but can also handle *any .zip* compressed 
file.  It does this by actually storing the content of the 
.zip/.docx/etc and re-compressing them on checkout. There is a note that 
things like office documents may change size by being checked in and 
then out due to low efficiency zip being used by MS-Word.  ZipDoc ships 
with TortoiseHg, and if enabled works from the command line as well, but 
can be downloaded and added to command line only installations.

The BigFiles <http://mercurial.selenic.com/wiki/BigfilesExtension> 
extension works rather differently by actually storing a straight copy 
of each version of the big file(s) in a specified, possibly remote, 
location and at each check in storing version information, (something 
like the MD5), of the current version of the "big" file. When you revert 
or switch branches of your repository it will check if it needs to fetch 
a different version of any "big" files.

The Largefiles <http://mercurial.selenic.com/wiki/LargefilesExtension> 
extension, which ships with Mercurial 2.0 and later, does a similar job, 
using SHA-1 to identify the versions, but also adds local cache for such 
files and is generally more transparent.

So while Mercurial is truly at heart a "source control system" it is not 
limited to such usage and does, with a little care, work very well with 
files that are not traditional "ASCII based source code"!

Hope that helps.

Gadget/Steve


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurial-scm.org/pipermail/mercurial/attachments/20140420/923572e3/attachment-0002.html>


More information about the Mercurial mailing list