Using a DVCS to distribute files across a cluster
John McGowan
john at lynch2.com
Tue Oct 21 13:34:11 UTC 2008
Hi,
I'm brand new to this list, In fact, I just heard about this project
yesterday. Mercurial came up because I was discussing a problem I
have to come up with a solution for, and a VCS isn't the first thing
we though of. When we started to think about it though, it seemed
like *maybe* a vcs could be a really slick solution to the problem we
have. Here's the problem.
We have a cluster of web servers, and we currently use rsync to keep
all the content across them synchronized. Right now, our rsync cron
job is "dumb" it just syncs everything every 5 minutes or so. This
worked nicely when we had 3 servers in the cluster (1 primary, and 2
secondary) but now that we have 9 servers in the cluster, I'm not so
happy with the rsync solution. The CPU cost of doing the rsyncs is
high. The usability of the cluster is going down, since we can't do
too many simultaneous rsyncs, it now takes over a 1/2 hour before a
new file makes it to all the servers.
We thought about solving this a couple of ways.
1. use a centralized file server - I don't want to compicate the
setup and worry about network traffic, or adding another potential
failure point/bottleneck to the equation.
2. continue to use rsync, but be smarter about it. Don't ever rsync
the entire tree, just rsync individual files as you find out they are
updated.
3. use a (D)VCS to hold all of the files that are getting synced, and
do commits/updates (pushes/pulls?) to keep the other machines in the
cluster up to date
I don't know anything about Mercurial. I could imagine a way to do
what I need to do with SVN, because I'm familiar with SVN. However,
I'm under the impression that Git and Mercurial (the new "cool" kids
on the block) are designed with speed, second only to correctness.
I'm not really interested in working with Git, so here I am.
A side effect of using the VCS would of course be that changes to the
tree would be versioned. Which is pretty cool. Of course, we use VCS
on our code, that gets added to the web tree, but versioning the tree
itself, would give us some functionality we've never had in the
past.... Client calls and says " uh... i accidentally deleted a
directory full of images using ftp, or the CMS... can you get it back
for me?...."
The directory tree I'm thinking about doing this with is around 65 GB,
around (320,000 files). I could also create a separate repo for each
of the sites in that tree if I needed to.
Thoughts? Is this a feasible solution to this problem? Is anybody
doing anything like this now?
/John
More information about the Mercurial
mailing list