Mercurial-devel Digest, Vol 2, Issue 29
Guenther Brunthaler
gbrun at gmx.at
Tue Nov 21 16:20:30 UTC 2006
Hi Matt,
mercurial-devel-request at selenic.com wrote:
> From: Matt Mackall <mpm at selenic.com>
> Subject: Re: Mercurial-devel Digest, Vol 2, Issue 25
> To: Guenther Brunthaler <spam_me_not_please_dont at gmx.nospam.net>
> No, that's only the tip of the iceberg. The semantic problems of
> dealing with metadata are endless.
I agree that some work would be necessary to implement it into Mercurial.
But for the sementic problems, this is a completely different issue.
> How do I interpret this metadata in
> the context of another user?
That's easy: As the user's hook decides to do.
Mercurial itself only needs to deal with a single metadata stream type:
"hg:data". That's because it contains the contents of version controlled
files, which should be checked out.
> Or on a platform that where it's not
> relevant?
Easy: The hooks then will ignore it.
But the metadata will still be there, and users can examine it, view it,
change it - or just ignore it.
However, it will still remain part of the repository.
> What happens when a user on that platform commits?
If the user ignored the metadata, and did not change it, simply nothing
happens.
It's exactly the same as for a file the user had no interest in: No
editing - no change - no delta in the revlog.
> Do we
> destroy the old metadata?
That's something the hook must take care of. Mercurial does not need to
even think about this issue.
If the hook does not perform any special action on checkin or checkout
for metadata, then the metadata just stays as it is. Unchanged.
> What metadata should be inherited?
When I wrote "inherited" I actually meant "implicit inheritance".
This does not copy anything.
Think about some "hg:ignore" metatdata item, which contains a list of
files to be ignored for the directory that property stream has been
attached to.
The hook which is called by Mercurial in order to determine whether an
existing file in the checkout directories should be ignored or not, will
first try to locate the hg:ignore stream for the directory each file in
question is in.
If that directory does not have an hg:ignore property, the hook will
look at the parent directory and try to get the hg:ignore property from
there. This continues until the root of the checkout directory tree has
been reached, in which case the hook will fallback to the contents of
the .htignore file in the checkout root directory (if present).
This means "interitance by search order".
Mercurial itself is not affected by this at all.
It's all up to the hooks: They decide whether to support a specific
property or not; and whether to (runtime-) "inherit" properties from the
parent directories or not.
Mercurial does not, and should not have to know anything about this. It
must only provide its hooks - and metadata.
> What if
> an historical piece of metadata is wrong? How can we correct it?
That's exactly the same question as: What if the historical contents of
a file are wrong? How can we correct it?
And the answer is, at least in Mercurial: As metadata is versioned
exactly in the same way as file contents are, they can't be changed for
that revision, because that change would result in a different hash and
thus cannot refer to the *same* revision.
But it is perfectly possible to check the old revision out, change the
metadata, and check it in as a new branch / revision.
So there is nothing special: Business as usual for Mercurial. Exactly as
for files.
> Should it be copied when a file gets copied?
Of course. Because the metadata is *part* of the file (or directory).
Think of metadata as a subfile. It's part of its contents.
If you copy a file, all of its contents will be copied.
Same for metadata.
> Is there some metadata
> that is mandatory to interpret?
Yes: The hg:data stream, because it represents the file contents.
No other metadata stream needs to be interpreted in any way by
Mercurial: That job will be delegated to the hooks.
And if the hooks don't handle it, then the metadata just does not have
any effect on that client's machine.
But it still is there, and the client will see it if he displays the
manifest.
> What happens with merge?
Also like files. With the difference that conflicts cannot resolved
automatically, i. e. no automatic merging. Of course, users can provide
a hook to attempt automatic merging.
But even without hooks, there is a simple case that can be resolved: If
both parents have a stream that is byte-by-byte identical with that in
the other parent, it's "assumed" there is no conflict, as the streams
are obviously identical.
So what to do if no hooks are provided: Mercurial should created the
conflicting merge in a temporary file and require the user to resolve
the conflict and remove the temporary file.
> Arbitrary metadata by definition has no defined semantics.
Exactly. With the exeption of "hg:data" of course, because this will
have a well-defined semantic - it contain's the file data. (Or it's
primary/default data stream, if a platform uses more streams.)
For instance, in a Macintosh, hg:data would refer to the file's "data
fork", while the checkin hook might have chosen to save the file's
"resource fork" into a custom property stream named "fork:resource".
But's that the hook's business.
And regarding the "what if a text file is checked out on a different
platform"-problem: Subversion has already solved that for the
line-ending conversion.
It uses a property svn:eol-style for this which can have 4 different values:
* "native": Local line ending conventions are used to convert the text
file on checkin/checkout; within the repository the file contents will
always use "lf".
* "cr": Conversion: Locally use "cr", repository uses "lf".
* "lf": No conversion: Repository uses same format as local.
* "crlf": Conversion: Locally use "cr/lf", repository uses "lf".
* (missing): No action: Binary file. (Same as "lf", but that's an
implementation detail).
User hooks in Mercurial could do exactly the same.
And if they won't, then things happen exactly the same way as they
happen now (i. e. autodetection/autoconversion is attempted).
> All of the
> questions above have multiple answers.
Of course, there are many ways how metadata could possible be supported
by an SCM.
I tried to suggest a minimal approach which does not restrict anything a
user might want to do with metadata.
> Doing it right is impossibly
I hope I dispeled that concern in this post. If you see more problems,
please tell me. I'm confident I can find a way to resolve them.
> complex with little reward
It's not too complex. The basic operation of Mercurial stays exactly the
same. No changes are required to the revlog, nodeid or manifest format.
Only the layout of the .hg/data subdirectory is augmented by an
additional directory level.
That's all - on the data structure side.
Of course, I cannot tell how big the changes in the user-interface code
might be - I'm not a Python programmer.
But the number of changes should not be too large: Of course, the copy
move and rm commands need to be changed.
The manifest display command should display the leaf directory levels as
streams now.
And of course the checkin and checkout commands need to know that they
should ignore the leave levels of the .hg/data directory tree except for
the .../hg_data.d and .../hg_data.i files which contain the file data to
be checked in or out.
So, certainly it's a rather big change - but not too big to be done.
It's not a question whether it can be done; it's the question whether it
shall be done.
> why the entire iceberg is staying out of Mercurial.
Basically a good philosophy, and I agree mostly.
However, staying simple should not exclude extensibility.
And while hooks could do nearly everything I have written about just
now, they are missing a common framework of version-controlled metadata
to act upon.
Of course, hooks could maintain such a framework by themselves.
But a minimum builtin support is still required, because hools are optional.
For instance, the hooks could create and maintain a parallel
subdirectory tree for metadata streams in the .metadata subdirectory of
the checkout directory.
This will even work if a user on a machine without those hooks will
checkout that repository.
But problems will arise if the user renames some of the files: In this
case, the metadata directory structure will no longer match the "files"
directory structure.
Same of files are copied.
Also the command to display the metadata will not show the metadata next
to the files (or directories) it belongs, but a couple of screen pages
away (in large manifests).
This will be an maintenance nightmare.
> Because they have well-understood semantics and wide applicability.
Try to tell that a Windows user!
If I recall it rightly, it was suggested to create normal copies in that
case on Windows platforms.
Fine.
And not tell me what happens if such a symlink is a directory symlink
which refers to the same directory? Or it's parent?
And what it the symlink is an absolute symlink and does not refer to any
file within the repository?
Good luck trying to find a good solution for this ;-)
Because the bad truth is: Windows does not have symlinks (the cygwin
symlink emulation only works in the cygwin environment).
And DOS has no symlinks at all.
But even on Linux symlinks are not available everywhere: Ever tried to
create a symlink on a partition which was mounted as -t vfat?
So it's true: Symlinks should be supported on platforms which support them.
But there must be a solution what shall happen if a project is checked
out on a filesystem that does not support them.
The metadata-solution is one possible solution for that problem.
Just trying to emulate symlinks by copies will not work on the cases
illustrated above.
> They are free to download the source.
Yes.
But writing a hook is certainly easier than modifying the source code of
an SCM without being deeply involved with that project.
Actually, if Mecurial was written in a language I was accustomed to, I
would have tried to change the source code myself.
But I have zero experience with Python, so that's certainly a task too
daring for me at the moment.
Greetings,
Guenther
More information about the Mercurial-devel
mailing list