High Availability of hg web server through NFS share

Jesper Noehr jesper at noehr.org
Thu Aug 11 00:48:51 UTC 2011


On Thu, Aug 11, 2011 at 8:34 AM, Brodie Rao <brodie at bitheap.org> wrote:
> On Wed, Aug 10, 2011 at 1:09 PM, Christophe Furmaniak
> <christophe.furmaniak.ml at gmail.com> wrote:
>> Thanks Isaac for the pointer! (I may have searched with the wrong keywords
>> on the bitbucket blog).
>>
>> These 2 articles give more informations:
>>
>> http://blog.bitbucket.org/2010/08/25/bitbucket-downtime-for-a-hardware-upgrade/
>> http://blog.bitbucket.org/2010/09/16/outage-incident-and-our-new-monitoring-setup/
>>
>> From what I understand/guess, they seem to have at least 2 front end
>> machines and a shared storage (a Storage, Dell MD1120 DAS array).
>> I don't know about Dell DAS array, I'll keep on searching.
>>
>> Anybody from Bitbucket on the mailing list?
[…]

To give a quick answer to the NFS question:

When we were on EC2, we had to segregate our storage across several
block devices to obtain enough I/O to run the site, and thus we didn't
have this problem.

However, when we moved from EC2 to Contegix, the infrastructure was
re-architectured to run off a single NFS server. We were quite worried
that we'd run into various locking and/or corruption issues, so we
kept our segregating logic in place, even though everything was still
stored on a single logical block device.

After a while, we decided to run some tests and see if we could
actually provoke any of the errors we were worried about, so I wrote a
little bomb-nfs.py script and ran it for a few days. Surprisingly,
nothing got messed up, so we removed the segregation logic, and it's
been like that for almost a year now. We haven't seen any issues.

Hope that helps,


Jesper



More information about the Mercurial mailing list