Status of lfs and lock extensions
Daniele Benegiamo
danielebenegiamo at fastwebnet.it
Sat Jan 23 17:12:26 UTC 2021
On 2021/01/22 06:11, Matt Harbison wrote:
> On Thu, Jan 21, 2021 at 7:12 AM Daniele Benegiamo
> <danielebenegiamo at fastwebnet.it> wrote:
>>
>> On 2021/01/19 19:33, Matt Harbison wrote:
>>>>> [...]
>>>>> The LFS extension is shipped with Mercurial. Don't let the fact that it
>>>>> is marked experimental scare you off- I use it in production. The
>>>>> experimental tag is mostly so that we can change some of the
>>>>> fileset/revset/template functionality without worrying about backward
>>>>> compatibility. I have no intention of changing the storage layout in an
>>>>> incompatible way. There is a TODO list of future plans that I hope to
>>>>> get to some day:
>>>>>
>>>>> https://www.mercurial-scm.org/repo/hg/file/5.6.1/hgext/lfs/TODO.rst
>>
>> From a quick look at the history of the pointed directory, it seems
>> there's not much activity on the extension since ~1 year (at least on
>> that branch/repo). Are there any plans on if/when it will move out from
>> the "experimental" status? [just to understand the long term plan about
>> this feature in the Mercurial's team vision]
>
> Yeah, this was something started by Facebook, and then I did enough
> work on it to make it meet my needs for work. I don't think the other
> developers use it, so they're only changing it when refactoring things
> requires it. The "plan" is to complete most/all of the TODO list (or
> at least the parts that would be backwards incompatible changes like
> the filesets/templates) before removing the experimental label. Right
> now I'm pretty busy with python3 porting on Windows and TortoiseHg, so
> I don't have a date in mind. But again, don't let the label scare you
> off- it's just that there are planned behavior changes noted in the
> TODO. The only sharp edge you may run into is with `hg grep` wanting
> to download the files to search, but it's probably pretty easy to make
> that command use the raw data instead (which will cause it to search
> the lfs pointer data instead of the blob, effectively skipping it).
Ok! I think it's normal that most development is driven by people
needs/interests. Thanks for your work on the extension and for the
clarification!
>> Many devs working on multimedia/interactive projects (including
>> simulation, VR and video games of course - projects that often rely on
>> game engines and proprietary binary files) are migrating from Subversion
>> and Perforce to git+git-lfs because DVCSs have very pleasant advantages.
>> It would be great to have Mercurial in the set of the alternatives.
>
> Agreed. If you'd like to try your hand at implementing some of it,
> I'm happy to mentor/give advice.
This is something we've already discussed internally and it's a
possibility for the future. We're a very small company and at the moment
we don't have enough resources to maintain tools other than the products
we're working on. But as products will become more complex and will
involve more people, the plan is to move to more efficient tools and so
to add to Mercurial the features we need.
When we'll be able to step in, having someone pointing us in the right
directions in the Mercurial code-base would be very helpful! Thanks.
>>> [...]
>> Thanks! Accordingly to the docs, in case I need to specify the lfs.url
>> to test other git-lfs backends, I must set it in the global config file,
>> right?
>
> You can set the config in any config file. If you have multiple
> repos, you probably want it in the repo level .hg/hgrc, since the url
> is repo specific.
Thanks!
> [...]
>
>>> The protocol is basically a client request and reply sequence:
>>>
>>> HTTP POST:
>>> C: "I'd like to upload these blobs, here are the hashes"
>>> S: "OK, here is a list of URLs, one per blob that needs to be
>>> uploaded, ignoring stuff I already have"
>>>
>>> HTTP PUT:
>>> C: "Here's blob 1 content"
>>> S: success/failure status
>>> C: "Here's blob2 content"
>>> ....
>>
>> It sounds easy. But usually problems are in the details :)
>
> Yeah, but this isn't something you need to worry about unless you're
> doing development work or looking at wireshark traces to debug.
I was already thinking about the development of the locking protocol ;)
> [...]
> I wasn't thinking too deeply about how you could handle the locking
> outside of Mercurial. I figured maybe write a quick and dirty tool
> that could either use the API to lock it, or maybe some of the fancier
> servers have a web interface to manage the locks? Maybe that's too
> much hassle though, and it might just be easier to implement it (the
> client part anyway) in Mercurial. It might be worth looking at the
> Facebook repo to see if they have locking support yet. If so, it
> could be really easy to borrow.
I think that having the locking feature integrated into the VCS is the
most effective way to handle it. Having locks managed with "external"
tools can only be a source of issues. So I agree that having this basic
feature integrated into Mercurial (at least on the client) is the best
option. But reading more about the git-lfs protocol, I'm not sure it's a
good locking protocol (more on this below).
About Facebook, I think here is their last version of the lfs extension:
https://github.com/facebookexperimental/eden/tree/master/eden/scm/edenscm/hgext/lfs
But I don't see any reference to locking features.
> I did a quick read through the spec, and I'm a bit puzzled by it. It
> seems much stripped down from the original proposal (which talked
> about checking out files read-only unless they were unlocked, among
> other things). But otherwise, I couldn't figure out what the benefit
> was. It lets you edit and commit the files locally, and only yells
> when you go to push. But at that point, you've made the edits, and
> potentially have to redo them if someone else did as well. If there
> were support (or you were using git), do you have a workflow in mind?
I think they implemented the "Single Branch Model" option
(https://github.com/git-lfs/git-lfs/blob/master/docs/proposals/locking.md#what-not-to-do-single-branch-model):
the scope of a lock extends to the entire repository, so until you push
your changes you're sure no one else is able to lock the same file and
edit it (also on other branches!). In addition it requires locks to be
taken always on the "latest version from the 'main' branch" of the file.
While simple, it implies too many restrictions for real-world scenarios.
Reading the "Multi-Branch Model" option
(https://github.com/git-lfs/git-lfs/blob/master/docs/proposals/locking.md#a-more-usable-approach-multi-branch-model),
it seems that the locks extend only to the "descendant" version of a
file (forcing you to work always on the latest version of it), and that
you can "break" that connection with an ancestor if needed (with this
revision becoming a new "root" for future locks on its descendants).
If my interpretation is correct, this should be a very good approach:
- people working on short-lived parallel branches would always work on
mergeable versions of the binary files;
- while people working on experimental/feature/maintenance/variant
branches would have the option to have a new, independent "locking
context" to continue working on their "unmergeable changes" (but in a
consistent way with its future descendant revisions).
One thing that I didn't found in the proposal, is what happens when
merging a file involving two distinct "locking contexts". For practical
purposes, I think the user should have the option to pick one of the
parent "locking context" (and maybe having also the option to create a
new). But I'm sure this have implications related to how git/Mercurial
manage metadata on versioned files.
I think most real-world scenarios involving proprietary binary files
should be correctly addressed by the "Multi-Branch Model" option of git-lfs.
BTW, it seems to be a good extension of the one documented already on
https://www.mercurial-scm.org/wiki/LockExtension, where locks were
scoped by named branch.
A simpler "intermediate" solution - for practical use-cases - could
consists of the old hglock Mercurial proposal, with the addition of
having the concept of "locking context" associated to the named
branches. In this way, a file lock would extends to a specific "locking
context", possibly involving multiple branches. This would support the
two most common scenarios described above: people working on short-lived
parallel branches that must keep unmeargeable files in sync; and people
working on branches where files can diverge from other "locking
contexts" (but still be consistent in other descendant branches with the
same "locking context").
Branches "locking context" should then be simply preserved on merging by
default.
> [...]
>
>>>> I see the locking API is documented here:
>>>> https://github.com/git-lfs/git-lfs/blob/master/docs/api/locking.md
>>>>
>>>> If I have understood, an hypothetical client "lfs-lock" extension on
>>>> Mercurial could use that API to manage locks on server (if supported by
>>>> the server of course). The server should then enforce the locks on push.
>>>
>>> Yes, though the functionality should probably just be built into the
>>> lfs extension.
>>
>> I don't know how Mercurial extensions works, but I suppose that an
>> extension can define distinct "hooks" for when it's running on the
>> client and the server?
>
> Yeah, extensions have free reign to do what they want inside the
> Mercurial process. I'm just saying if someone adds `hg lfs-lock`, `hg
> lfs-unlock`, etc, it should go into the existing extension. Hooks are
> mostly for external things, but the extension already wraps some
> internal stuff for `hg push` for example. So that's probably a
> natural place for the messages about things that were pushed that need
> to be unlocked. And that's likely true of other areas that need
> support.
>
I've understood. I agree that a locking implementation of the git-lfs
specifications should be part of the main lfs extension.
Bests,
Daniele.
More information about the Mercurial
mailing list