Mercurial-devel Digest, Vol 2, Issue 29

Guenther Brunthaler gbrun at gmx.at
Tue Nov 21 16:20:30 UTC 2006


Hi Matt,

mercurial-devel-request at selenic.com wrote:
> From: Matt Mackall <mpm at selenic.com>
> Subject: Re: Mercurial-devel Digest, Vol 2, Issue 25
> To: Guenther Brunthaler <spam_me_not_please_dont at gmx.nospam.net>

> No, that's only the tip of the iceberg. The semantic problems of
> dealing with metadata are endless.

I agree that some work would be necessary to implement it into Mercurial.

But for the sementic problems, this is a completely different issue.

> How do I interpret this metadata in
> the context of another user?

That's easy: As the user's hook decides to do.

Mercurial itself only needs to deal with a single metadata stream type: 
"hg:data". That's because it contains the contents of version controlled 
files, which should be checked out.

 > Or on a platform that where it's not
> relevant?

Easy: The hooks then will ignore it.

But the metadata will still be there, and users can examine it, view it, 
change it - or just ignore it.

However, it will still remain part of the repository.

> What happens when a user on that platform commits?

If the user ignored the metadata, and did not change it, simply nothing 
happens.

It's exactly the same as for a file the user had no interest in: No 
editing - no change - no delta in the revlog.

 > Do we
> destroy the old metadata?

That's something the hook must take care of. Mercurial does not need to 
even think about this issue.

If the hook does not perform any special action on checkin or checkout 
for metadata, then the metadata just stays as it is. Unchanged.

 > What metadata should be inherited?

When I wrote "inherited" I actually meant "implicit inheritance".

This does not copy anything.

Think about some "hg:ignore" metatdata item, which contains a list of 
files to be ignored for the directory that property stream has been 
attached to.

The hook which is called by Mercurial in order to determine whether an 
existing file in the checkout directories should be ignored or not, will 
first try to locate the hg:ignore stream for the directory each file in 
question is in.

If that directory does not have an hg:ignore property, the hook will 
look at the parent directory and try to get the hg:ignore property from 
there. This continues until the root of the checkout directory tree has 
been reached, in which case the hook will fallback to the contents of 
the .htignore file in the checkout root directory (if present).

This means "interitance by search order".

Mercurial itself is not affected by this at all.

It's all up to the hooks: They decide whether to support a specific 
property or not; and whether to (runtime-) "inherit" properties from the 
parent directories or not.

Mercurial does not, and should not have to know anything about this. It 
must only provide its hooks - and metadata.

> What if
> an historical piece of metadata is wrong? How can we correct it?

That's exactly the same question as: What if the historical contents of 
a file are wrong? How can we correct it?

And the answer is, at least in Mercurial: As metadata is versioned 
exactly in the same way as file contents are, they can't be changed for 
that revision, because that change would result in a different hash and 
thus cannot refer to the *same* revision.

But it is perfectly possible to check the old revision out, change the 
metadata, and check it in as a new branch / revision.

So there is nothing special: Business as usual for Mercurial. Exactly as 
for files.

> Should it be copied when a file gets copied?

Of course. Because the metadata is *part* of the file (or directory).

Think of metadata as a subfile. It's part of its contents.

If you copy a file, all of its contents will be copied.

Same for metadata.

> Is there some metadata
> that is mandatory to interpret?

Yes: The hg:data stream, because it represents the file contents.

No other metadata stream needs to be interpreted in any way by 
Mercurial: That job will be delegated to the hooks.

And if the hooks don't handle it, then the metadata just does not have 
any effect on that client's machine.

But it still is there, and the client will see it if he displays the 
manifest.

> What happens with merge?

Also like files. With the difference that conflicts cannot resolved 
automatically, i. e. no automatic merging. Of course, users can provide 
a hook to attempt automatic merging.

But even without hooks, there is a simple case that can be resolved: If 
both parents have a stream that is byte-by-byte identical with that in 
the other parent, it's "assumed" there is no conflict, as the streams 
are obviously identical.

So what to do if no hooks are provided: Mercurial should created the 
conflicting merge in a temporary file and require the user to resolve 
the conflict and remove the temporary file.

> Arbitrary metadata by definition has no defined semantics.

Exactly. With the exeption of "hg:data" of course, because this will 
have a well-defined semantic - it contain's the file data. (Or it's 
primary/default data stream, if a platform uses more streams.)

For instance, in a Macintosh, hg:data would refer to the file's "data 
fork", while the checkin hook might have chosen to save the file's 
"resource fork" into a custom property stream named "fork:resource".

But's that the hook's business.

And regarding the "what if a text file is checked out on a different 
platform"-problem: Subversion has already solved that for the 
line-ending conversion.

It uses a property svn:eol-style for this which can have 4 different values:

* "native": Local line ending conventions are used to convert the text 
file on checkin/checkout; within the repository the file contents will 
always use "lf".
* "cr": Conversion: Locally use "cr", repository uses "lf".
* "lf": No conversion: Repository uses same format as local.
* "crlf":  Conversion: Locally use "cr/lf", repository uses "lf".
* (missing): No action: Binary file. (Same as "lf", but that's an 
implementation detail).

User hooks in Mercurial could do exactly the same.

And if they won't, then things happen exactly the same way as they 
happen now (i. e. autodetection/autoconversion is attempted).

> All of the
> questions above have multiple answers.

Of course, there are many ways how metadata could possible be supported 
by an SCM.

I tried to suggest a minimal approach which does not restrict anything a 
user might want to do with metadata.

 > Doing it right is impossibly

I hope I dispeled that concern in this post. If you see more problems, 
please tell me. I'm confident I can find a way to resolve them.

> complex with little reward

It's not too complex. The basic operation of Mercurial stays exactly the 
same. No changes are required to the revlog, nodeid or manifest format.

Only the layout of the .hg/data subdirectory is augmented by an 
additional directory level.

That's all - on the data structure side.

Of course, I cannot tell how big the changes in the user-interface code 
might be - I'm not a Python programmer.

But the number of changes should not be too large: Of course, the copy 
move and rm commands need to be changed.

The manifest display command should display the leaf directory levels as 
streams now.

And of course the checkin and checkout commands need to know that they 
should ignore the leave levels of the .hg/data directory tree except for 
the .../hg_data.d and .../hg_data.i files which contain the file data to 
be checked in or out.

So, certainly it's a rather big change - but not too big to be done.

It's not a question whether it can be done; it's the question whether it 
shall be done.

> why the entire iceberg is staying out of Mercurial.

Basically a good philosophy, and I agree mostly.

However, staying simple should not exclude extensibility.

And while hooks could do nearly everything I have written about just 
now, they are missing a common framework of version-controlled metadata 
to act upon.

Of course, hooks could maintain such a framework by themselves.

But a minimum builtin support is still required, because hools are optional.

For instance, the hooks could create and maintain a parallel 
subdirectory tree for metadata streams in the .metadata subdirectory of 
the checkout directory.

This will even work if a user on a machine without those hooks will 
checkout that repository.

But problems will arise if the user renames some of the files: In this 
case, the metadata directory structure will no longer match the "files" 
directory structure.

Same of files are copied.

Also the command to display the metadata will not show the metadata next 
to the files (or directories) it belongs, but a couple of screen pages 
away (in large manifests).

This will be an maintenance nightmare.

> Because they have well-understood semantics and wide applicability.

Try to tell that a Windows user!

If I recall it rightly, it was suggested to create normal copies in that 
case on Windows platforms.

Fine.

And not tell me what happens if such a symlink is a directory symlink 
which refers to the same directory? Or it's parent?

And what it the symlink is an absolute symlink and does not refer to any 
file within the repository?

Good luck trying to find a good solution for this ;-)

Because the bad truth is: Windows does not have symlinks (the cygwin 
symlink emulation only works in the cygwin environment).

And DOS has no symlinks at all.

But even on Linux symlinks are not available everywhere: Ever tried to 
create a symlink on a partition which was mounted as -t vfat?

So it's true: Symlinks should be supported on platforms which support them.

But there must be a solution what shall happen if a project is checked 
out on a filesystem that does not support them.

The metadata-solution is one possible solution for that problem.

Just trying to emulate symlinks by copies will not work on the cases 
illustrated above.

> They are free to download the source.

Yes.

But writing a hook is certainly easier than modifying the source code of 
an SCM without being deeply involved with that project.

Actually, if Mecurial was written in a language I was accustomed to, I 
would have tried to change the source code myself.

But I have zero experience with Python, so that's certainly a task too 
daring for me at the moment.

Greetings,
Guenther




More information about the Mercurial-devel mailing list