Any advice on shared access via ssh as well as large files?

Diab Jerius djerius at cfa.harvard.edu
Tue Jan 12 19:10:15 UTC 2021


Hi!

I'm designing a centralized system which will be used to version
control text and binary content for a web site.

My preference is to use mercurial (the alternative is git + gitolite +
gitannex), but there are a few places where there are some pain points
I'd like to work around.  I'm hoping someone who has had experience
with both shared access via ssh and large files might help me sort them out.

The design constraints are:

   1. The users are not sophisticated.

   2. Authors should have read-write access to their own content,
      and read-only access to others' (to allow
      intra-site link validation during local builds).

   3. Some of the content consists of large (>500Mb) binary files.

   4. Disks (including home directories) are remote NFS (NetApp) mounts.

   7. Users already have general purpose accounts and local policy
      forbids having more than one account.

   8. Remote access to the network is ssh only.

My design looks like this:

* The actual central repository layout is simple, one repository per
   content area.

* For access control, I'd prefer something like gitolite [1], which
   uses a single ssh account and an ACL based on the authenticated
   remote user.  Using shared diskspace over NFS is not reliable due
   to file locks.

   My first preference would be to use hg's acl extension, but its use
   over ssh (using hgssh) requires a separate, dedicated, account for
   each user. As users have a general purpose account and are limited
   to one account, that won't work.

   The closest third-party setup similar to gitolite that I've found is
   mercurial-server [3], but it seems abandoned [4].

   The last option that I believe will work is hgssh4 [2], which
   provides a simple ACL on top of hg over ssh with a single account
   managing the repositories.  The drawback is that it's not shipped
   with mercurial (or available from CentOS), which might
   run into some internal support policy issues.

* The best option for large files seems to be the Largefiles
   extension, even though it is documented as being a "feature of last
   resort".  The pain point with it is that it tries to hard-link files
   to the usercache and when that fails, copies them.  Because the
   default usercache is to home directories which on our network are
   NFS mounted, the link will fail and the files will be copied, which
   will create a lot of network I/O and will fill up the (usually small)
   home directories.  To avoid this, users will have to set up repo
   specific largefiles.usercache config values pointing to somewhere
   on the disk their repo is on, which is not optimal.

   An alternative would be to use the narrow extension and avoid the
   large files.  Unfortunately it is

     a) experimental; and
     b) undocumented.

   I'm also concerned that when a repo is widened to provide access to
   a large file the entire history of the file might get transferred.
   Unless there some manner of limiting the history to just the latest
   version of the file, that could be a significant load.

   Another alternative might be to use git-lfs.  For remote users, ssh
   port forwarding to the git-lfs server is possible, but awkward.
   However, more importantly, none of the standalone LFS servers that
   I've looked at have authentication [5].


Installing something like RhodeCode would solve many of these issues,
and I've performed that experiment, but the cost/benefit analysis for
system support required to maintain it doesn't work out for this
project.

If anyone has any thoughts on these issues, I'd appreciate your input.

Thanks,

Diab


[1] https://gitolite.com/gitolite/index.html

[2] https://hg.sr.ht/~xaltsc/hgssh4

[3] (http://www.lshift.net/mercurial-server.html

[4] It's website redirects to a seemingly unconnected corporate
     site. Debian has removed it from upcoming releases
     (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953483) The
     most up-to-date code I could find is at
https://mageia.pkgs.org/cauldron/mageia-core-release-aarch64/mercurial-server-1.3-17.mga8.noarch.rpm.html,
     which has conversions for Python 3.

[5] https://github.com/git-lfs/git-lfs/wiki/Implementations



More information about the Mercurial mailing list