Ad Widget

**zalex_ua** · 06-05-2012, 20:50

Just a nice article http://www.zabbix.org/wiki/Docs/howto/high_availability

**dotneft** · 11-05-2012, 21:08

Additionally you can try these ways:
MMM - http://mysql-mmm.org/
or
http://www.mysqlperformanceblog.com/...-availability/ - it's very interesting way... but i don't know how it works (in production).

**mcmyst** · 14-05-2012, 23:15

There is a point that worries me about how people manage automated failover with database in master-master replication ?

I mean, if we look at my first idea, we suppose that the link between the 2 data centers break down : the zabbix primary server will keep running AND the second one will start up !! Here start the problem : 2 zabbix server writing on the database in the same time. When the replication will be able to synchronize (when the link between the 2 places will be ok) the database will be completely corrupted...

Thank you for you replies !

**dbarton** · 17-09-2012, 10:47

Originally posted by mcmyst

There is a point that worries me about how people manage automated failover with database in master-master replication ?

I mean, if we look at my first idea, we suppose that the link between the 2 data centers break down : the zabbix primary server will keep running AND the second one will start up !! Here start the problem : 2 zabbix server writing on the database in the same time. When the replication will be able to synchronize (when the link between the 2 places will be ok) the database will be completely corrupted...

Thank you for you replies !

Hi

I don't know if this topic is still watched, but I might have some informations to the master-master replication.

If you're running a MySQL master/master (active/active) setup and the link between the both of them will break, there's no "auto" startup of the secondary zabbix server. You've to do that w/ separate tools like heartbeat and pacemaker - most likely you'll already know that

We're talking here about a master/master (active/active) MySQL setup in a active/passive cluster setup. This means the MySQL replication between the active and passive cluster nodes is running all the time, while the Zabbix services are only running on the active cluster node.

Now expect that the required configuration of a cluster (e.g. heartbeat + pacemaker) is already done and everything works as designed. Now the link between both data centers is going down and heartbeat reports a failure on both nodes. At this moment, both nodes are "thinking": "hey I'm alive and my partner's dead" - we call this a split-brain syndrome.

While all services on the active cluster node are still running, the passive cluster node is starting all Zabbix related services. So at this point, both cluster nodes are active and writing to their own DB (now w/o replication).

Now after a while the link between both data centers is coming back online, which means both cluster nodes can see each other again. Depending on the configuration, an alert is sent or a cluster node is going passive again (i.e. shutting down all Zabbix related services). MySQL is keeping up & running on both nodes, since it's not a cluster related service.

And here's where the MySQL replication comes into play. In a replication setup, there's always a master and a slave. The master is logging all transactions (called binary log) and the slave is registered to that binary log. A slave always knows it's position in this log and can catch up w/ the master, even if it's disconnected for a while (e.g. if slave has position 20 and master has position 25, then the slave has to catch up all transactions from 20 to 25). Now in a classic master/slave setup, only the master is allowed to write, because the slave can never be ahead of the master (because it's only one way replication). But in a master/master setup, both MySQL servers are masters and both MySQL servers are slave from each other. So they're able to write both to their databases and catch up, as soon as they're connected again (because every server has it's own binary log).

BUT THIS IS IMPORTANT: If both severs will write to the same tables, then you might have a problem w/ auto-increment values. To avoid that problem, you've to set the following configuration variables on the servers:

server 1:

Code:

auto_increment_increment=2
auto_increment_offset=1

server 2:

Code:

auto_increment_increment=2
auto_increment_offset=2

This will increment all auto-increment fields by 2 instead of 1. On the server 1 you've an offset of 1 (i.e. 27, 29, 31, ...) and on the server 2 you've an offset of 2 (i.e. 28, 30, 32, ...).

Cheers
Domi

**whowd** · 17-09-2012, 23:48

Originally posted by dbarton

Hi
server 1:

Code:

auto_increment_increment=2
auto_increment_offset=1

server 2:

Code:

auto_increment_increment=2
auto_increment_offset=2

This will increment all auto-increment fields by 2 instead of 1. On the server 1 you've an offset of 1 (i.e. 27, 29, 31, ...) and on the server 2 you've an offset of 2 (i.e. 28, 30, 32, ...).

Cheers
Domi

Zabbix does not use the database auto_increment functionality. It maintains it's own set of ids and as such this feature in MySQL will not help you.

w

**dbarton** · 19-09-2012, 09:13

Originally posted by whowd

Zabbix does not use the database auto_increment functionality.

This isn't true! Zabbix uses auto_increment in some tables!

Originally posted by whowd

It maintains it's own set of ids and as such this feature in MySQL will not help you.

You're right w/ that BUT my recommendation is working perfectly in a active/passive setup like mcmyst mentioned.

On a failover you can switch to the datacenter 2 and start all zabbix services w/o any further configuration (since both MySQL servers are in sync). As soon as datacenter 1 is available again, the MySQL (DC1) will catch up w/ it's slave (DC2) and it will be in sync again - so you shutdown all zabbix services on DC2 and startup all zabbix services on DC1 again.

BUT you're right, there is a "problem": if both data centers running split brain, then my setup might not work because of the Zabbix internal IDs. So if there's a split brain, you've to reset one MySQL...

Sorry didn't was aware of the internal ID management.

Cheers

**mcmyst** · 19-09-2012, 20:05

For now I have decided to set up the architecture like this:

http://demo.ovh.com/fr/b50d6770d07e14992c8d178a5e689d9a/

Do you know what will happen if both proxies are trying to send there data back to the server after a split brain situation ? (proxies in DC1 and DC2 were both running during the split brain)

The idea is to, FIRST, avoid the data loss, even if there is no alert during the failure of the physical node in the DC1 due to the manual failover (if nobody has seen the problem).

And then when this will work correctly, I will place a third node in a third datacenter, and use the "quorum" functionality to avoid the split brain situation (only the part of the cluster with 2 node will start the resources) to set up a 3 node cluster for zabbix server.

**Rawlings** · 24-09-2012, 17:15

Hello

How do you calculate the volume (for define RAM / CPU /DISK) on your servers . especially for database server

kind Regards

Ps : I Using PostgreSql with Pgpool for HA . This work very good

**dotneft** · 28-09-2012, 15:45

You can try MySQK Galera replication. But it needs 3 nodes. Of course, You can create one virtual node (arbiter).

**burn1024** · 26-10-2012, 20:56

I always choose Postresql over Mysql if I have the choice. I recommend the same to you.

Ad Widget

Zabbix database choice, and HA solution

Zabbix database choice, and HA solution

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment