Dealing with binary files (was Re: [PATCH]Make hg diff go nice on binary files)
Matt Mackall
mpm at selenic.com
Wed Jul 27 20:26:16 UTC 2005
On Wed, Jul 27, 2005 at 11:20:21AM -0700, Bryan O'Sullivan wrote:
> On Wed, 2005-07-27 at 10:53 -0700, Matt Mackall wrote:
>
> > There are three ways to do it:
> >
> > a) by file contents
>
> > b) by file extension
>
> > c) by per-file flag
>
> I'd strongly, strongly, strongly prefer A, backed up by C at add time.
Let me play devil's advocate a bit..
> Using A alone has some properties that I do not like at all:
>
> * It will always be wrong for some kinds of file.
> * Its choice either cannot be overridden at all (which would be
> catastrophic), or
> * must be overridden *every time* a misclassified file is dealt
> with (which would be very annoying).
We'll still need to allow overriding at times other than commit for
the cases where the user got it wrong at commit time. Bear in mind
that such a flag will be per file revision so you won't be able to go
back and correct it.
I suspect our automatic classification will be right well over 99% of
the time (most files are text, after all) and that of the remainder,
the user will fail to correctly classify them for us at commit time
well over 50% of the time as they'll just assume we'll get it right.
And they won't notice anything's wrong until they try to do a merge
quite a ways down the road, at which point they'll probably have
several incorrectly marked revisions that they'll always have to use
command-line overrides when dealing with.
So by doing c), we've made binary handling much more complicated and
fixed less than 50% of a problem that was very small to start with.
Looked at another way, there are exactly three things we'll use a
binary flag for:
- deciding whether we can diff/export/annotate
- deciding whether to merge
- deciding how to display something in hgweb
The first is the most important. And as that's generally something we
eyeball, it's easily fixed up with an override flag. So I think
automatic is just fine here.
The second is perhaps another problem altogether. And it's perhaps
best solved in hgmerge. Can we three-way merge PNG files? Maybe, if we
fire up GIMP. How about XML? We might want to handle that with a
special tool, even though it's text.
And the hgweb case is probably a separate problem too. Arguably we
should be doing some MIME magic but we might use is_binary as a hint
that we need to do that.
So really it comes down to "how do we decide what files we can diff?"
I think we ought to think about a), with command-line overrides, plus
possibly some hgrc-based regex overrides too.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list