Ad Widget

**guesommer** · 21-07-2011, 21:33

Re: High availability and failover

We're using linuxha (pacemaker) as HA solution.

The database in use is mysql (performance reasons).

We are doing the replication via DRBD (and linuxha is controling this).

DRBD works absolutly fine, we had the most troubles, that the cluster stack sometimes does not properly recover after a full crash (done by hand) of the nodes.

**dnshat** · 21-07-2011, 22:11

HA setup at dnshat.com

Originally posted by gde

Here's my question : what solution have you folks been using to assure high availability ?

Hi gde - just saw a twitter post leading me here

I'm using Zabbix in the core of dnshat.com to provide DNS Failover and automated DNS Load Balancing solutions on a subscription basis for client websites. Basically I use Zabbix to monitor client websites for specific content strings, and if the content strings are not found - trigger actions then update MySQL records in a replicated backend database for a redundant PowerDNS setup.

I have 2 core monitoring locations. Only one location is active at a given time. I use MySQL master/master replication between these two locations for the zabbix database. If the primary cloud site fails, custom scripts on the secondary cloud site see that the primary is down and start zabbix on the secondary system where it resumes site monitoring services - when the primary is restored my scripts on the secondary system shutdown the zabbix server on the secondary so only the zabbix server on the primary is running. Because of the replicated mysql backened - everything stays in sync. If I loose both the primary and the secondary monitoring location - I have manual procedures in place to activate a 3rd mysql slave only instance in another datacenter (changing it to a master and starting up the zabbix server processes in the 3rd location manually - which I have never had to use in production - but its nice to know if I loose my primary and secondary I have a 3rd system ready to take over).

I usually serve the dnshat website on the secondary monitoring system, with the zabbix php web interface used from on the secondary (writing to the secondary database which flows through replication to the primary database where the zabbix_server binaries are running). The primary Zabbix watches the secondary webserver - if secondary fails primary shifts DNS resolution sending web traffic to the primary server. If the primary fails, the secondary is already active in DNS for webtraffic so no shift is needed (just the scripts to startup the zabbix_server binary). This situation works well for me since I am really only using the zabbix web monitoring pieces for my DNS failover services - it would be more complicated if I was connecting to zabbix agents and needed the monitoring to only originate from a single IP preconfigured in the agents conf files since as far as I know they only allow 1 IP for the authorized zabbix server (but I could see a script based system fired off from a secondary to connect to a list of agents and change config files and restart agents to allow polling from a secondary IP - it would be tricky to build but it could be done - a much better solution would be for the zabbix agent configs to allow entry of multiple source zabbix server IPs).

Just sharing what I'm doing - if you "master" (pun intended) MySQL replication - it opens up new possibilities for how you can architect HA capabilities using Zabbix.

**DSon** · 10-10-2011, 14:50

dnshat: FYI..

Just read your HA solution and this sounds very flexible.

One thing I noticed is that you mentioned the possibility of adding multiple Zabbix servers in the agent.conf.

Well, you might be pleased to know that you can already do this (they need to be seperated by a comma).

There is unfortunately a small caveat to this function, namely that only the first IP address can be used for active checks. This probably won't be a problem for you however, since you confessed to not having to monitor agents.

Other than that, you may find this function useful.

Hope this helps,
Danny.

**r3dn3ck** · 29-02-2012, 20:59

mysql + heartbeat + stonith + shared storage. Simple and effective. The only failover event to occur to date did so seamlessly.

**DSon** · 07-03-2012, 12:39

Stonith, or not? (split brain)

re: Stonith - I have thus far read mixed opinions on whether or not this is needed.

e.g. YES - if more than 2 nodes in a cluster, otherwise - NO.

Having been running with several 2-node (heartbeat/pacemaker) clusters for a while now I have already observed several "split brain" occurrences to date (DRBD for shared storage).

Each time, manually recovery was needed (using drbdadm - nothing to do with H/beat from what I could tell).

What are other people's experiences in this area?

i.e. can Stonith be used to avoid DRBD split-brain with only 2 nodes?

Danny.

**richlv** · 07-03-2012, 17:46

stonith is needed if two nodes running some service at the same time can cause some problems. node count does not matter, this can be true even if you only have two nodes

**frankymryao** · 21-04-2012, 07:34

a conception: update_percent, it describe the percent that a host has updated in last few minutes, it is very accurate.

Ad Widget

High availability and failover

High availability and failover

Comment

Comment

Comment

Comment

Comment

Comment

Comment