Regular repository corruption -- help needed.

Matt Mackall mpm at selenic.com
Thu Dec 20 17:00:41 UTC 2012


On Wed, 2012-12-19 at 16:49 -0800, Bryan O'Sullivan wrote:
> The forensics that I see are 100% consistent with a situation where your
> NFS client happily says "here's your file full of zeroes" when you read
> back a file (not full of zeroes) that you wrote earlier. You are not
> guaranteed write-to-read consistency even on a single node (whatever the
> specs may say).

Note that it could also be a problem on the append side. We successfully
read back the temporary file and the write fails (while successfully
updating the file length). Equally, the data could have never made it to
the temp file. We do know that it was successfully read from the client
repo and successfully parsed before we tried to write it to the temp
file.

I think the only thing we can say with confidence here is: not
Mercurial's fault. It's clear a write was done and data was lost in
transit somewhere between the client unpacking the bundle and arriving
at its final resting place on the server's disk. That leaves:

- client DRAM
- client pagecache
- NFS coherency
- server DRAM
- server pagecache
- server filesystem (ext3? ext4?)
- DMA engine
- storage device

(because we read back from a temp file that may have hit the disk, we
can have disk corruption that manifests without alignment)

The idea that Linux would have an undiagnosed write-to-read NFS
consistency bug would not top my list, as massive numbers of apps like
gcc and sendmail depend on it. I think it's more likely to be flaky RAM
or similar. "Spans of zeros showed up in my files" is one of the most
common forms of corruption and doesn't seem to be correlated with much.

> When are you going to switch to bitbucket? You know it's free, right?

Last I heard, Bitbucket was implemented on top of NFS. We actually have
tons of people using Mercurial on top of NFS with very few problems. The
only bug we've ever conclusively pinned on NFS was:

http://bz.selenic.com/show_bug.cgi?id=382

That's not to say that I think NFS is the best way to deploy Mercurial.
Giving clients direct write access to the repo exposes you to the
superset of all clients' software, hardware, and wetware problems.
Sticking hgweb in between means it's much harder for clients to scribble
on your main repo.

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial mailing list