Regular repository corruption -- help needed.
Alexander Krauss
krauss at in.tum.de
Wed Dec 19 23:50:39 UTC 2012
Dear list,
In the Isabelle project (http://isabelle.in.tum.de) we've been using
Mercurial without problems since 2008, but since this summer we are
experiencing regular corruption of our central push/pull area.
I am looking for help on how to investigate this issue, which happens
sporadically but often enough to be really worrying, since we must
re-clone the whole repository when it happens -- a stop-the-world
administrative operation.
The setup:
- The central repository sits on an NFS mount, which is accessed from
a number of machines. (I know that this is not nice, but it is not
so easy to change at the moment.)
- Developers usually push via ssh, connecting to one of the machines
which has access to the NFS mount, i.e.:
hg push ssh://somemachine//nfs/central/repos
but today I have seen the issue occur also on a plain local push.
- Before the push, the repository is ok, and afterwards it is
corrupted:
$ hg log
abort: integrity check failed on 00changelog.i:50603!
hg verify displays a "first damaged changeset" n. Here,
n is a revision that was already present before the push, not just a
newly pushed revision.
We must then re-clone up to revision n - 1.
- For analysis, I can provide tarballs (130M each) of
(a) the corrupted repository:
http://www21.in.tum.de/~krauss/isabelle-corrupt-2012-12-19.tar.gz
(b) the (intact) origin of the push:
http://www21.in.tum.de/~krauss/isabelle-push-origin-2012-12-19.tar.gz
Unfortunately, I do not have the original intact state of the push
destination anymore.
- Due to the NFS, concurrent operations may be part of the
problem. However, I am rather sure that there were no concurrent
push or other write attempts. But some automated tools regularly pull
from this source.
- Some more info:
- hg version: 2.4, Python 2.7.3, Linux 3.6.10 (some SuSE version)
- We have an older repository format:
$ cat /nfs/central/repos/.hg/requires
revlogv1
fncache
store
- Active extensions from ~/.hgrc
[extensions]
extdiff =
transplant=
color =
hgext.graphlog =
hgext.record =
hgext.convert=
mq =
share =
I appreciate any help on how to get to the source of the problem.
We are also looking into moving to a hosting service like Bitbucket,
to eliminate potential NFS issues, but nevertheless, I think this
issue is worth pursuing on its own.
Alex
More information about the Mercurial
mailing list