Binary files, filenames, etc.
Matt Mackall
mpm at selenic.com
Tue Jul 12 10:20:23 UTC 2005
On Tue, Jul 12, 2005 at 10:38:16AM +0100, Stephen Darnell wrote:
> This may sound like a Windows support gripe, but it's not (well mostly not).
> I come from an environment that uses both Unix and Windows, with large
> code bases in Perforce. I think Mercurial is extremely promising, but
> there are a number of things that it will need to support IMHO if it is
> going to really take off, and some of these are much harder to add later on.
>
> - Filetype distinction
>
> The most important is the ability to record that a file is binary.
> Matt wrote:
> >Mercurial has no concept of binary files. All files are the same to
> >it. Nor is there a clear definition of what a binary file is anyway.
>
> There is no clear definition, which is why I think the SCM tool needs to
> record, and allow the user to specify the filetype. The tool also needs
> a way to transfer and express patches to binary files, and changes to the
> filetype. (Other metadata too such as renames)
We can already transfer binary file deltas by push and pull. That's
about 95% of what's needed in practice. And we can already store
renames as well.
> I gather Macs (and NTFS) also have files with multiple 'forks' of data.
> I'm not sure how critical support for this is though.
To support forks, you'll have to start by forking Mercurial. I think
they're the stupidest filesystem feature ever. Thankfully Macs are
rapidly moving away from forks now that they're based on UNIX.
> - Line ending handling
>
> For some platforms, Unix end of lines don't always work well. The key
> requirement here is to be able to distinguish text files from binary files,
> and then have some configuration to allow the user to choose which form
> files should be presented in. Sensible defaults for Unix, Windows and Mac
> are obvious but a user config override is needed too.
>
> Of course the world isn't perfect, and at least on Windows, some editors
> insist on inserting DOS end of lines into perfectly acceptable \n terminated
> files. Perforce has a handy solution to this problem, with a mode that
> outputs files with \n but strips \r's when submitting to prevent pollution.
I'm willing to accept solutions to this that don't uglify the main
code. But because this doesn't affect any systems I care about or use,
someone else will have to find that solution.
> - Filenames with spaces
>
> Spaces. OK, they're the devil's sporn, but they need to be handled,
> both as versioned files, and in callouts.
>
> Other special shell characters can be an issue, but SCM tools
> should try to accept them. In my experience the key thing is to
> avoid using system() as much as possible.
Should mostly work, but untested.
> - Filename case sensitivity
>
> This is another tricky issue, particularly if the repository storage
> uses versioned filenames directly. At least Windows is case preserving.
> In my opinion there is no right answer, and the best solution is that
> the SCM tool should be able to be switched into a case sensitive mode
> on a repo basis.
There is a right answer, namely don't be case insensitive. Again, this
is a Windows-only problem and I'm not going to lose any sleep over it.
But I'm open to fixes that don't impact the code for sensible systems.
> - Character encoding
>
> ASCII Is not enough. Both filename and file content are increasingly stored
> in different encodings, although the most significant difference is that
> they are multi-byte.
>
> I would suggest considering this, in particular, the metadata consequences.
> I think maybe storing all text files as UTF-8 would probably do the trick,
> as ASCII encodes well, but there may be a performance impact.
UTF-8 is fine for filenames and commit text and the like (Mercurial
was designed with UTF-8 transparency in mind) but absolutely wrong for
anything that's version controlled.
> - Other wishes
>
> hg log should take a -m/--max option so that it lists only the last n
> changes
> (should work with a file and without a file)
Already does.
> Sometimes hg is too quiet. Apart from possibly a clone, I'd like to know
> when hg changes any of my files. Maybe I'm just paranoid, but learning
> a new SCM tool, using a tip version of hg, and using hg on non-Unix makes
> me nervous. In particular I think hg update should give some indication
> of the outcome...
There's a -v switch. hg update is quiet by default because it often
takes more time to scroll a line on the screen than to check out a
file.
> In a similar vein, hg should also give more information about what it
> will do. It has already been discussed on the ML, but a -n/--noexec
> option that outputs what would happen without actually doing it would be
> good. For hg pull it should indicate what changes/files would be
> pulled. For hg update, it should indicate the files that would be changed.
> This might reduce the need for undo.
>
> hg is too verbose in some circumstances though. In particular, the output
> from hg log and hgweb changelog is way too verbose IMHO. I'd suggest that
> the default should be a single or couple of lines per change, e.g.:
>
> 685 2005/07/12 Matt Mackall "Added tag 0.6b for changeset 4ccf3de52989..."
> [tip]
> 684 2005/07/12 Matt Mackall "Turn off signing with hgeditor by default"
> [0.6b]
> 683 2005/07/12 Matt Mackall "Revert hgeditor change to manifest bits"
>
> or
>
> 685:79fb7032739f0ef675af [tip] 2005/07/12 06:58 Matt Mackall
> Added tag 0.6b for changeset 4ccf3de52989b14c3d84e1097f59e39a992e00bd
>
> 684:4ccf3de52989b14c3d84 [0.6b] 2005/07/12 06:56 Matt Mackall
> Turn off signing with hgeditor by default
>
> 683:104d2aee3b442f2ee4b2 [] 2005/07/12 06:54 Matt Mackall
> Revert hgeditor change to manifest bits
>
>
> Is it possible to sync back to a given changeset number?
hg update <n>?
> Even on platforms where hard link cloning of the repo is pretty quick,
> I think that it is only human nature to be working on multiple independent
> things at a time in one directory. For these situations it would be nice
> to be able to flag (locally) that a few files are related to a pending
> change, e.g.:
> $ hg status
> [brief templates]
> C templates/changelog.tmpl
> C templates/changelogentry.tmpl
> [fix local clone on windows]
> C mercurial/commands.py
> C mercurial/util.py
I'll file this under "quilt-like features" or "fancy commit tool".
> Perforce has a nice wildcard facility where ... is like * but crosses /
> boundaries. It is really easy to say for example:
> $ hg log .../hgweb.py
> or
> $ hg add docs/adminguide/...html
We've got a plan shaping up for more powerful file specification.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list