Status of lfs and lock extensions
Matt Harbison
mharbison72 at gmail.com
Tue Jan 19 18:33:40 UTC 2021
On Tue, Jan 19, 2021 at 7:45 AM Daniele Benegiamo
<danielebenegiamo at fastwebnet.it> wrote:
>
> On 2021/01/19 03:54, Matt Harbison wrote:
> > > I would like to know if there are any news (or future plans) about the
> > > "lfs" extension and any sort of support for a "centralized file
> > > locking" feature.
> >
> > The LFS extension is shipped with Mercurial. Don't let the fact that it
> > is marked experimental scare you off- I use it in production. The
> > experimental tag is mostly so that we can change some of the
> > fileset/revset/template functionality without worrying about backward
> > compatibility. I have no intention of changing the storage layout in an
> > incompatible way. There is a TODO list of future plans that I hope to
> > get to some day:
> >
> > https://www.mercurial-scm.org/repo/hg/file/5.6.1/hgext/lfs/TODO.rst
>
> Hi Matt!
>
> thanks a lot for all the shared information! And I'm very happy
> development on it is going forward.
>
> I'll test it for sure. Can I follow the instructions in the extension
> documentation to set it up?
>
> https://www.mercurial-scm.org/repo/hg/file/5.6.1/hgext/lfs/__init__.py
Yes, that should work. I'd ignore the `lfs.track` config, and track
`.hglfs` instead. You won't need `lfs.url` either if you use
Mercurial as the server, but it sounds like you may need to point to a
git server if you want locking. (Unless you get the lock extension
working, and transition to lfs locking in the future. But IDK the
state of that extension.)
> (later few questions about the server)
>
> (sorry for some basic questions about the setup - but we're new on the
> lfs protocol)
>
>
> > [...]
> >
> > > While the "largefiles" extension is well-established, I would like to
> > > know if there's any update about the "lfs" extension
> > > (https://www.mercurial-scm.org/wiki/LfsPlan).
> >
> > If you are settled on using one or the other, I would *strongly*
> > recommend using LFS instead of largefiles. It is much simpler, and less
> > likely to break commands in subtle ways. You can convert between LFS
> > and a normal repo using the convert extension without changing hashes,
> > and the LFS and normal repo will be able to do exchanges with each
> > other. (The convert extension does have edge cases where it can change
> > hashes when it probably shouldn't, but I've been planning to try to use
> > the repo upgrade command to do a conversion that shouldn't change hashes.)
>
> Thanks!
>
> Could you give some more clues about the "break commands in subtle ways"
> about the "largefiles" extension? We just started testing such
> extensions, so we still don't have complete and detailed pros/cons.
I don't have any offhand- it's been ~3-4 years since I converted away
from it. But basically the way largefiles works, if you tell it to
track `foo.bin`, it intercepts that and tells Mercurial that you
really want to track `.hglf/foo.bin` (which it creates). And then it
has to mind both files (if you change `foo.bin`, it needs to update
the hash it stores in `.hglf/foo.bin`) without any real help from core
Mercurial. If you look at the largefiles extension code, you'll see
there are tons of functions and commands that are wrapped. Some of
these are trivial (do a small thing and delegate to the core
functionality), but some are non trivial or almost complete
replacements. Those things tend to get out of date, and/or missed
when new functionality is added to core Mercurial. Search the history
from ~2013-2017 to see all of the whack-a-mole fixes.
The lfs extension uses a very low level toggle such that if you ask
Mercurial to track `foo.bin`, that's the file that actually gets
tracked, and the expected content is returned when you ask Mercurial
to read that file. The low level toggle allows the pointer data to be
read or written only in the few cases that it is needed, instead of
always. If you look at the lfs extension, you'll see there are many
fewer things wrapped (and some of the wrapping adds functionality not
available in largefiles, so it does more with less).
> If I understand correctly, the "convert" extension supports
> bi-directional conversions: normal <-> largefiles and normal <-> lfs. Is
> it right? Or there are limitations? (it would be very helpful to run our
> tests)
Correct. The important thing to understand is that since largefiles
tracks `.hglf/foo.bin` in place of `foo.bin`, the commit hashes will
*always* change when you convert between normal <--> largefiles.
Since lfs tracks the file you want, you can freely convert between
normal <--> lfs, and the hashes will stay the same (barring the
convert extension edge cases I mentioned). Because the hashes are the
same, you can freely push and pull between the normal/lfs repos. (You
need to enable the extension for the normal repo IIRC, but you don't
have to *commit* anything as LFS).
> >
> > > The plan was to "re-use' the Git LFS protocol, and as such protocol
> > > supports file locking
> > > (https://github.com/git-lfs/git-lfs/wiki/File-Locking), it could be an
> > > ideal solution for us.
> >
> > It does use the git-lfs protocol, though I didn't implement the VERIFY
> > command. (That's not about checking the file, it's about supporting
> > uploading files to a 3rd party server.) It also doesn't support ssh.
>
> Thanks for listing them. Fortunately they're not important limitations
> for our current network setup.
>
>
> > Before the lfs serving functionality was added the Mercurial, I was
> > using a couple of 3rd party packages that implemented lfs for the
> > server. Most of the issues I had were around the server trying to be
> > clever with the User-Agent string it was sent, but I think it ended up
> > working with the two I tried. I since moved off to the native Mercurial
> > server, so IDK what the state of things is. The intention is that it
> > should be interoperable, so if you find something that doesn't work,
> > file a bug. (Though it's been awhile since I did LFS work, and the
> > issues have generally been 3rd party, so no promises to fix it.)
>
> I don't know very well how git-lfs works. It seems you need a dedicated
> server process that deals with the protocol and (I suppose) mediates
> with the underlying git repository. Or it can run independently by git?
Correct. If you use Mercurial for the server, you don't need a git
repo. It simply implements the protocol, and stores the files in the
underlying Mercurial repo. But you need locking, so 3rd party
packages might expect a git repo, even if it is empty. When I toyed
with 3rd party stuff, I looked at gitbucket and SCM Manager (because I
use the latter to host hg repos).
The protocol is basically a client request and reply sequence:
HTTP POST:
C: "I'd like to upload these blobs, here are the hashes"
S: "OK, here is a list of URLs, one per blob that needs to be
uploaded, ignoring stuff I already have"
HTTP PUT:
C: "Here's blob 1 content"
S: success/failure status
C: "Here's blob2 content"
....
> Just to be sure to have understood correctly, the current version of
> "hgweb" can be used as a remote git-lfs endpoint, and so it can be used
> directly with the "lfs" extension on the clients? Or we need other
> specialized software on server to run the tests?
Correct. Obviously you need to also enable the extension on the
server also. The only caveat is that it doesn't currently support the
locking protocol.
> > I didn't try any file locking (and it's not implemented in Mercurial
> > client), but maybe if you've already got a 3rd party server and can send
> > it the right commands to lock it on the server, it will (mostly) work? I
> > didn't look at any of the lfs extensions or proposals when working on
> > the server, because I needed the bare bones implementation.
>
> So the "lfs" extension could work with the current git-lfs v2 reference
> implementation (https://git-lfs.github.com)?
I don't remember a v2 reference when I implemented the server support
in early 2018. The batch command and basic-transfers specs I used
look basically unchanged since then:
https://github.com/git-lfs/git-lfs/tree/master/docs/api
> I see the locking API is documented here:
> https://github.com/git-lfs/git-lfs/blob/master/docs/api/locking.md
>
> If I have understood, an hypothetical client "lfs-lock" extension on
> Mercurial could use that API to manage locks on server (if supported by
> the server of course). The server should then enforce the locks on push.
Yes, though the functionality should probably just be built into the
lfs extension.
> Thanks for all the shared info!
> Daniele.
More information about the Mercurial
mailing list