Status of lfs and lock extensions

Daniele Benegiamo danielebenegiamo at fastwebnet.it
Sat Jan 23 17:12:26 UTC 2021


On 2021/01/22 06:11, Matt Harbison wrote:
> On Thu, Jan 21, 2021 at 7:12 AM Daniele Benegiamo
> <danielebenegiamo at fastwebnet.it> wrote:
>>
>> On 2021/01/19 19:33, Matt Harbison wrote:
>>>>> [...]
>>>>> The LFS extension is shipped with Mercurial.  Don't let the fact that it
>>>>> is marked experimental scare you off- I use it in production.  The
>>>>> experimental tag is mostly so that we can change some of the
>>>>> fileset/revset/template functionality without worrying about backward
>>>>> compatibility.  I have no intention of changing the storage layout in an
>>>>> incompatible way.  There is a TODO list of future plans that I hope to
>>>>> get to some day:
>>>>>
>>>>>       https://www.mercurial-scm.org/repo/hg/file/5.6.1/hgext/lfs/TODO.rst
>>
>>   From a quick look at the history of the pointed directory, it seems
>> there's not much activity on the extension since ~1 year (at least on
>> that branch/repo). Are there any plans on if/when it will move out from
>> the "experimental" status? [just to understand the long term plan about
>> this feature in the Mercurial's team vision]
> 
> Yeah, this was something started by Facebook, and then I did enough
> work on it to make it meet my needs for work.  I don't think the other
> developers use it, so they're only changing it when refactoring things
> requires it.  The "plan" is to complete most/all of the TODO list (or
> at least the parts that would be backwards incompatible changes like
> the filesets/templates) before removing the experimental label.  Right
> now I'm pretty busy with python3 porting on Windows and TortoiseHg, so
> I don't have a date in mind.  But again, don't let the label scare you
> off- it's just that there are planned behavior changes noted in the
> TODO.  The only sharp edge you may run into is with `hg grep` wanting
> to download the files to search, but it's probably pretty easy to make
> that command use the raw data instead (which will cause it to search
> the lfs pointer data instead of the blob, effectively skipping it).

Ok! I think it's normal that most development is driven by people 
needs/interests. Thanks for your work on the extension and for the 
clarification!


>> Many devs working on multimedia/interactive projects (including
>> simulation, VR and video games of course - projects that often rely on
>> game engines and proprietary binary files) are migrating from Subversion
>> and Perforce to git+git-lfs because DVCSs have very pleasant advantages.
>> It would be great to have Mercurial in the set of the alternatives.
> 
> Agreed.  If you'd like to try your hand at implementing some of it,
> I'm happy to mentor/give advice.

This is something we've already discussed internally and it's a 
possibility for the future. We're a very small company and at the moment 
we don't have enough resources to maintain tools other than the products 
we're working on. But as products will become more complex and will 
involve more people, the plan is to move to more efficient tools and so 
to add to Mercurial the features we need.
When we'll be able to step in, having someone pointing us in the right 
directions in the Mercurial code-base would be very helpful! Thanks.


>>> [...]
>> Thanks! Accordingly to the docs, in case I need to specify the lfs.url
>> to test other git-lfs backends, I must set it in the global config file,
>> right?
> 
> You can set the config in any config file.  If you have multiple
> repos, you probably want it in the repo level .hg/hgrc, since the url
> is repo specific.

Thanks!



> [...]
> 
>>> The protocol is basically a client request and reply sequence:
>>>
>>> HTTP POST:
>>> C: "I'd like to upload these blobs, here are the hashes"
>>> S: "OK, here is a list of URLs, one per blob that needs to be
>>> uploaded, ignoring stuff I already have"
>>>
>>> HTTP PUT:
>>> C: "Here's blob 1 content"
>>> S: success/failure status
>>> C: "Here's blob2 content"
>>> ....
>>
>> It sounds easy. But usually problems are in the details :)
> 
> Yeah, but this isn't something you need to worry about unless you're
> doing development work or looking at wireshark traces to debug.

I was already thinking about the development of the locking protocol ;)


> [...]

> I wasn't thinking too deeply about how you could handle the locking
> outside of Mercurial.  I figured maybe write a quick and dirty tool
> that could either use the API to lock it, or maybe some of the fancier
> servers have a web interface to manage the locks?  Maybe that's too
> much hassle though, and it might just be easier to implement it (the
> client part anyway) in Mercurial.  It might be worth looking at the
> Facebook repo to see if they have locking support yet.  If so, it
> could be really easy to borrow.

I think that having the locking feature integrated into the VCS is the 
most effective way to handle it. Having locks managed with "external" 
tools can only be a source of issues. So I agree that having this basic 
feature integrated into Mercurial (at least on the client) is the best 
option. But reading more about the git-lfs protocol, I'm not sure it's a 
good locking protocol (more on this below).

About Facebook, I think here is their last version of the lfs extension:

https://github.com/facebookexperimental/eden/tree/master/eden/scm/edenscm/hgext/lfs

But I don't see any reference to locking features.


> I did a quick read through the spec, and I'm a bit puzzled by it.  It
> seems much stripped down from the original proposal (which talked
> about checking out files read-only unless they were unlocked, among
> other things).  But otherwise, I couldn't figure out what the benefit
> was.  It lets you edit and commit the files locally, and only yells
> when you go to push.  But at that point, you've made the edits, and
> potentially have to redo them if someone else did as well.  If there
> were support (or you were using git), do you have a workflow in mind?

I think they implemented the "Single Branch Model" option 
(https://github.com/git-lfs/git-lfs/blob/master/docs/proposals/locking.md#what-not-to-do-single-branch-model): 
the scope of a lock extends to the entire repository, so until you push 
your changes you're sure no one else is able to lock the same file and 
edit it (also on other branches!). In addition it requires locks to be 
taken always on the "latest version from the 'main' branch" of the file.
While simple, it implies too many restrictions for real-world scenarios.

Reading the "Multi-Branch Model" option 
(https://github.com/git-lfs/git-lfs/blob/master/docs/proposals/locking.md#a-more-usable-approach-multi-branch-model), 
it seems that the locks extend only to the "descendant" version of a 
file (forcing you to work always on the latest version of it), and that 
you can "break" that connection with an ancestor if needed (with this 
revision becoming a new "root" for future locks on its descendants).
If my interpretation is correct, this should be a very good approach:

- people working on short-lived parallel branches would always work on 
mergeable versions of the binary files;
- while people working on experimental/feature/maintenance/variant 
branches would have the option to have a new, independent "locking 
context" to continue working on their "unmergeable changes" (but in a 
consistent way with its future descendant revisions).

One thing that I didn't found in the proposal, is what happens when 
merging a file involving two distinct "locking contexts". For practical 
purposes, I think the user should have the option to pick one of the 
parent "locking context" (and maybe having also the option to create a 
new). But I'm sure this have implications related to how git/Mercurial 
manage metadata on versioned files.

I think most real-world scenarios involving proprietary binary files 
should be correctly addressed by the "Multi-Branch Model" option of git-lfs.

BTW, it seems to be a good extension of the one documented already on 
https://www.mercurial-scm.org/wiki/LockExtension, where locks were 
scoped by named branch.

A simpler "intermediate" solution - for practical use-cases - could 
consists of the old hglock Mercurial proposal, with the addition of 
having the concept of "locking context" associated to the named 
branches. In this way, a file lock would extends to a specific "locking 
context", possibly involving multiple branches. This would support the 
two most common scenarios described above: people working on short-lived 
parallel branches that must keep unmeargeable files in sync; and people 
working on branches where files can diverge from other "locking 
contexts" (but still be consistent in other descendant branches with the 
same "locking context").
Branches "locking context" should then be simply preserved on merging by 
default.


> [...]
> 
>>>> I see the locking API is documented here:
>>>> https://github.com/git-lfs/git-lfs/blob/master/docs/api/locking.md
>>>>
>>>> If I have understood, an hypothetical client "lfs-lock" extension on
>>>> Mercurial could use that API to manage locks on server (if supported by
>>>> the server of course). The server should then enforce the locks on push.
>>>
>>> Yes, though the functionality should probably just be built into the
>>> lfs extension.
>>
>> I don't know how Mercurial extensions works, but I suppose that an
>> extension can define distinct "hooks" for when it's running on the
>> client and the server?
> 
> Yeah, extensions have free reign to do what they want inside the
> Mercurial process.  I'm just saying if someone adds `hg lfs-lock`, `hg
> lfs-unlock`, etc, it should go into the existing extension.  Hooks are
> mostly for external things, but the extension already wraps some
> internal stuff for `hg push` for example.  So that's probably a
> natural place for the messages about things that were pushed that need
> to be unlocked.  And that's likely true of other areas that need
> support.
> 

I've understood. I agree that a locking implementation of the git-lfs 
specifications should be part of the main lfs extension.

Bests,
	Daniele.



More information about the Mercurial mailing list