Any advice on shared access via ssh as well as large files?
Diab Jerius
djerius at cfa.harvard.edu
Tue Jan 12 19:10:15 UTC 2021
Hi!
I'm designing a centralized system which will be used to version
control text and binary content for a web site.
My preference is to use mercurial (the alternative is git + gitolite +
gitannex), but there are a few places where there are some pain points
I'd like to work around. I'm hoping someone who has had experience
with both shared access via ssh and large files might help me sort them out.
The design constraints are:
 1. The users are not sophisticated.
 2. Authors should have read-write access to their own content,
    and read-only access to others' (to allow
    intra-site link validation during local builds).
 3. Some of the content consists of large (>500Mb) binary files.
 4. Disks (including home directories) are remote NFS (NetApp) mounts.
 7. Users already have general purpose accounts and local policy
    forbids having more than one account.
 8. Remote access to the network is ssh only.
My design looks like this:
* The actual central repository layout is simple, one repository per
 content area.
* For access control, I'd prefer something like gitolite [1], which
 uses a single ssh account and an ACL based on the authenticated
 remote user. Using shared diskspace over NFS is not reliable due
 to file locks.
 My first preference would be to use hg's acl extension, but its use
 over ssh (using hgssh) requires a separate, dedicated, account for
 each user. As users have a general purpose account and are limited
 to one account, that won't work.
 The closest third-party setup similar to gitolite that I've found is
 mercurial-server [3], but it seems abandoned [4].
 The last option that I believe will work is hgssh4 [2], which
 provides a simple ACL on top of hg over ssh with a single account
 managing the repositories. The drawback is that it's not shipped
 with mercurial (or available from CentOS), which might
 run into some internal support policy issues.
* The best option for large files seems to be the Largefiles
 extension, even though it is documented as being a "feature of last
 resort". The pain point with it is that it tries to hard-link files
 to the usercache and when that fails, copies them. Because the
 default usercache is to home directories which on our network are
 NFS mounted, the link will fail and the files will be copied, which
 will create a lot of network I/O and will fill up the (usually small)
 home directories. To avoid this, users will have to set up repo
 specific largefiles.usercache config values pointing to somewhere
 on the disk their repo is on, which is not optimal.
 An alternative would be to use the narrow extension and avoid the
 large files. Unfortunately it is
   a) experimental; and
   b) undocumented.
 I'm also concerned that when a repo is widened to provide access to
 a large file the entire history of the file might get transferred.
 Unless there some manner of limiting the history to just the latest
 version of the file, that could be a significant load.
 Another alternative might be to use git-lfs. For remote users, ssh
 port forwarding to the git-lfs server is possible, but awkward.
 However, more importantly, none of the standalone LFS servers that
 I've looked at have authentication [5].
Installing something like RhodeCode would solve many of these issues,
and I've performed that experiment, but the cost/benefit analysis for
system support required to maintain it doesn't work out for this
project.
If anyone has any thoughts on these issues, I'd appreciate your input.
Thanks,
Diab
[1] https://gitolite.com/gitolite/index.html
[2] https://hg.sr.ht/~xaltsc/hgssh4
[3] (http://www.lshift.net/mercurial-server.html
[4] It's website redirects to a seemingly unconnected corporate
   site. Debian has removed it from upcoming releases
   (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953483) The
   most up-to-date code I could find is at
https://mageia.pkgs.org/cauldron/mageia-core-release-aarch64/mercurial-server-1.3-17.mga8.noarch.rpm.html,
   which has conversions for Python 3.
[5] https://github.com/git-lfs/git-lfs/wiki/Implementations
More information about the Mercurial
mailing list