Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Redundant array of inexpensive servers: clustering?



Bill wrote:
> ...The nicest HA solutions available today do
> require apps be "cloud" enabled, which is to say fully virtualized;

Quick response, thanks!  Yes, I do virtualize things here (using VirtualBox)
but that doesn't solve much of the problem for a home user.  I've actually
been de-virtualizing lately, switching most of my instances to the
easier/lighter-weight LXC containers.

VMware does have a tool called vMotion which provides the full-virtualization
SAN-backed design that some data centers use.  However it's not for home use,
so far as I know, unless you want to 2nd-mortgage the house.

> Choice 1 is whether storage is replicated or shared. Shared can be a
> cluster FS or a backend storage system

I make the assumption that clustering requires a SAN approach of some sort.
(My home solution thus far has been eventual-consistency tools like unison and
rsync; what I really want is a shared/guaranteed consistent setup like the one
I suggested with AoE/DRBD with a clustered filesystem like OCFS2).  The Galera
clustering approach looks like the one I'd really like to promote further:
instead of putting consistency-verification underneath the app, which led to
kludges like statement-level replication (that breaks all the time) or
NDBCLUSTER (which requires a whole different codebase that's always out of
sync or missing features of the dominant InnoDB), it sits between the user and
the app, running two or more copies of the app and verifying write-through
consistency at every operation  However I don't want to have to craft
app-specific Galera-type tools for each of my apps either.

> Choice 2 is running on OS-on-Iron or in managed VM's/cloud etc.

I prefer to have it at home on my own servers for cost-sanity reasons.
Opening an AWS account for the house just isn't gonna happen.

> Choice 3 is whether Apps run hot-hot, hot-cold, or hot-warm on the multiple
> nodes.

Ideally hot-hot.  But since most don't natively support clustering, you can
assume a hot-warm design is suitable for some, hot-cold failover for most.

> Hot-cold failover with restart on 2nd server when failure is detected is
> annoying config management but perfectly doable, but must be configured
> [ and managed and tweaked from now until doomsday ]

That's what I'm trying to leave behind in the 1990s where it belongs.

> If you make both your servers into VM hosts of any VM brand that allows
> load-balancing, and package each of your services into VM images, you can
> script auto(re)start of services.

I don't think it's quite that simple because of the data consistency issue.
First I have to figure out the SAN (and if OCFS2 or GlusterFS aren't it, then
what is?), then I have to figure out how to get the data separate from the
services...you get the picture.  This is months of design effort which is
worth it if you've got a $20 million+ service to run, but I'm looking at this
from the little-guy perspective: aren't the tools getting better for the
little guys?  So far it seems like "no".

> I note the upcoming BLU presentation in OpenStack, which is a one of many
> potential toolkits of possible use.

Yes, I've been holding my breath for OpenStack for a few years.  Maybe a
two-server installation will eventually be able to make use of it, or at least
a subset of it.

What I'm hoping to see is something like vMotion for the home setup (there's a
kinda-sorta version of that available for VirtualBox, and I'd love to hear
from anyone who has tried using the teleport feature for HA), and a nonstop
software-SAN filesystem that can run on two nodes (possibly with a third for
split-brain arbitration) without getting a PhD in block storage first.  My
OCFS2-over-iSCSI/AoE experiment ran aground with this unresolveable error
message:

 Starting global heartbeat for cluster "cicluster": Failed
 o2cb: Heartbeat region could not be found 65344A72790E4C2C916AD8B22FBA20DC

If anyone else has OCFS2 running (with the global heartbeat feature) then
perhaps I could give it another try.

Thanks again for the response!

-rich





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org