Mercurial-devel Digest, Vol 2, Issue 25
Guenther Brunthaler
spam_me_not_please_dont at gmx.nospam.net
Mon Nov 20 20:04:28 UTC 2006
> From: Matt Mackall <mpm at selenic.com>
> Subject: Re: User metadata support
> I think it might be useful, but I also think it can't be done right.
I'm sorry to hear that!
Certainly it's not a feature to be implemented just over night.
But, perhaps, in some future version?
> ACLs are a perfect example of things Mercurial shouldn't care about.
I agree. That's why I think user-defined hooks should take care of such
things, if users are actually in need for that.
But hooks need metadata to process on.
So, I'm *not* suggesting to include ACLs, symlinks or other things into
Mercurial.
Instead, users should be provided by some means of setting and accessing
metadata for each version-controlled object.
User-defined hooks can then implement that funtionality. All without
Mercurial having to know about such applications by user hooks of the
metadata streams it maintains.
> They're not portable from one user to another on the same box let
> alone one OS to another, there's no sane merge semantics, and they're
> arbitrarily complex.
That would actually not a problem for Mercurial if something like the
suggested metadata stream feature was implemented: Only because the
metadata is simply *there*, there is no obligation by a client to
install a hook or make otherwise use of the metadata at all.
For instance, think of symlinks stored as metadata.
If the client is a UNIX box, it clearly has the capability of making use
of "symlink" metadata streams.
So the user who checks out the repository can install a hook which
creates or updates symlinks on checkout. Just the same way a user might
install a hook for keyword expansion.
If the same repository is checked out on a Windows box, where there are
no symlinks, the user will simply *not* install a hook which handles
symlink metadata streams.
In that case, no symlinks will be created and nothing happens. But the
metadata is still there and part of the repository; it's just dormant.
Or the Windows user will choose to install a hook for symlink metadata -
but one which creates copies of the symlinked files on checkout, or
resynchronized changes back in rsync manner on checkin.
It's completely up to the user: Mercurial need not care about it!
For Mercurial, metadata streams are just files.
Files to be version controlled.
The difference between the current way Mercurial operates and the
suggested way is only that it then would have to hide the files at the
leaf level of the file tree.
Because those files will be *interpreted* as streams.
But for Mercurial, they *are* no streams - they are just files to be
kept under the control of a revlog.
So all the changes are needed at "high level" only, no changes to the
underlying revlog or repository layout format are required.
Plus there is the bonus that it is then possible to version-control
directories also at no additional cost.
> If you're going to need a hook to deal with them anyway, just check in
> a .acl file that contains the information you need and have the hook
> process it.
I have thought of that already.
But the problem is: The contents of that file needs to be synchronized
on file renames or moves.
But even if it was implemented that way: Monotone works exactly that
way, using its .mtattr file (as far as I can remember).
It did not work well.
Actually, the shortcomings of .mtattr was the main reason why I
abandoned Monotone (aside from that awful LUA scripting language) and
turned to SVK which can do all that right out of the box.
But there is also a more fundamental reason: The metadata streams for a
file or directory share no relation among each other, so it feels a bit
artifical to stuff them together into a single file.
Another aspect is versioning: If the metadata streams of different
filesystem objects are kept in different revlogs, the can also be
versioned independently of each other.
If all the metadata was stuffed into a single file, that file would
change with every version when the metadata of any object in the file
tree was changed.
If metadata is kept in independent revlogs, metadata is vesioned exactly
the same way file contents are. (Especially as file contents are one
specific kind of metadata stream.)
Another issue is performance: There is no reason to restrict the size of
metadata streams in any way.
Metadata streams may contain short text strings as well as long binary
data objects.
For Mercurial it's all just binary files; it won't care about the
contents of the streams at all: It checks them in and out, and creates
binary deltas from it. Like it is doing for normal files already.
So it might not be the best idea to merge them together into a single
file anyway.
So, in order to do things right, it would currently be necessary to
create two parallel subdirectory structures in parallel for each
project: One subtree for the file data, and the other subtree for the
remaining data streams for the hooks to act upon.
For instance, instead of my suggested layout
./hg/data/somedir/someotherdir/somefile/hg_data.d
./hg/data/somedir/someotherdir/somefile/readonly.d
./hg/data/somedir/someotherdir/somelink/hg_symlink.d
("readonly" here is an example of a user-provided metadata stream with
no relevance to Mercurial itself) then the following structure could be
used:
./hg/data/somedir/someotherdir/somefile.d
./hg/data/.metadata/somedir/someotherdir/somefile/readonly.d
./hg/data/.metadata/somedir/someotherdir/somelink/hg_symlink.d
which will check out a directory .metadata containing the metadata for
the files in the main tree.
This would work, but the problems are:
* Error prone - if users rename a file or directory, they must rename
the .metadata subdirectory as well, or things will get out of sync.
* Easy to lose the general view. In this model, metadata and the
filesystem objects affected by it are completely de-coupled. There is no
easy way to see which metadata streams are connected to which files or
directories, especially in large projects.
Consider the example above:
Using the do-it-yourself approach, the manifest will look something like:
.metadata/somedir/someotherdir/somefile/readonly <hexstuff>
.metadata/somedir/someotherdir/somelink/hg_symlink <hexstuff>
somedir/someotherdir/somefile <hexstuff>
if Mercurial supported metadata streams directly, this would rather read
something like:
somedir/someotherdir/somefile <hexstuff>
somedir/someotherdir/somefile [readonly] <hexstuff>
somedir/someotherdir/somelink [hg:symlink] <hexstuff>
In the first case, it's not easy to see that "somelink" is actually in
the same directory as "somefile" is. In the second way it is.
But the most apparent reason why metadata support should be built in are
the "hg mv" and "hg copy" operations.
For instance, a
$ hv mv somedir/someotherdir/somefile somedir/someotherdir/othername
would change the revlogs into:
./hg/data/somedir/someotherdir/othername/hg_data.d
./hg/data/somedir/someotherdir/othername/readonly.d
./hg/data/somedir/someotherdir/somelink/hg_symlink.d
well, ok, it's not a big change... but if we used the do-it-yourself
method, more at different places are needed:
.metadata/somedir/someotherdir/othername/readonly <hexstuff>
.metadata/somedir/someotherdir/somelink/hg_symlink <hexstuff>
somedir/someotherdir/othername <hexstuff>
> Mercurial stays simple, and your metadata gets handled
> precisely the way your project needs.
I also like the idea that Mercurial stays simple.
So why implementing symlinks, or special support for the executable bit?
Forget about it, and provide metadata support instead!
There is nothing more to be done, because then everything else can be
done by user hooks, which are not part of Mercurial.
It's much like the keyword expansion feature: Not built into Mercurial,
but available to clients through hook scripts.
So why not doing the same trick for symlinks? Or directory attributes?
Or ACLs? Or Line-ending conversion? Or NTFS streams?
> This is also a frequently asked question. From the wiki (BinaryFiles):
>
> - If you can't autodetect the file type, you will lose.
Actually I *did* read the FAQ before I posted.
But as I pointed out, there are cases when autodetection is simply not
enough.
And, as metadata support would allow to implement all this via user
hooks, the Mercurial developers need never care about that feature!
So, implementing metadata support can actually save you a lot of
hassles, because issues like "line ending conversion" or "character set
conversion" as well as "directory attributes" can all be solved using
metadata streams and hooks: You will never have to talk about it again.
Think about keyword expansion: Problem solved, and it's not actually
part of Mercurial!
Why not doing the same for symlinks support?
>> * Stream metadata. Machines like the Apple Macintosh can use different
>> streams in a file, the so-called "data fork" and "resource fork".
>
> This is the biggest filesystem misfeature ever and even Apple had the
> good sense to deprecate them. Their primary purpose in Windows land is
I totally agree. It was just an example what could be done with metadata
via user hooks.
Without implementing such support directly into Mercurial, that is.
If users *want* it, they can implement it themselves using metadata and
hooks.
> to introduce security holes. Now I'm going to have nightmares about
Yes. It's a braindamaged feature.
...but some users like it.
> Mercurial invisibly checking in trojans hiding in text files, thanks.
Not Mercurial will do anything witch such metadata (other than
version-controlling it).
It's the user hooks which will run the viruses if they are braindamaged
enough to do it.
Whatever happens in the user's hooks, it's not the responsibility of
Mercurial any more.
And, by the way: NTFS streams can store any data, including viruses. But
that does not *run* the viruses.
It's exactly the same as a normal file: It could also contain a virus,
and be version-controlled by Mercurial.
So what's the matter.
Greetings,
Guenther
More information about the Mercurial-devel
mailing list