ZABBIX Forums  
  #1  
Old 04-05-2012, 20:28
mcmyst mcmyst is offline
Member
 
Join Date: Feb 2012
Location: France
Posts: 69
Default Zabbix database choice, and HA solution

Hello,

I am currently setting up a monitoring solution for the company where I am working based on zabbix.
Firstable I have to choose the database engine that will be able to handle aproximatively between 1500 and 2000 nvps. I see lots of you are using mysql+innodb or mysql-percona+xtraDB, often saying that they have chosen mysql for performance reason.

Is mysql faster than postgresql in a zabbix environment ?

The second point that I am working on is a HA solution. But for now it would not be a full automated HA solution, I prefer for the beginning an HA solution with a human failover.

Here is a solution that I would like to set up:
Datacenter 1:
- mysql/postgresql server ACTIVE
- zabbix server ACTIVE
- zabbix proxies N ACTIVE

Datacenter 2/
- mysql/postgresql server ACTIVE
- zabbix spare server SHUTDOWN
- zabbix proxies SHUTDOWN

In case we see that the datacenter 1 have a problem, we just start spare servers manually to retrieve the monitoring solution.

So questions :
- Is this kind of architecture is suitable for you ?
- Why some people are using DRBD instead of mysql/postgresql/oracle replication ?

EDIT:
(1.1.1.1 to 1.1.1.4 are ip addresses of the different servers)
Attached Images
 

Last edited by mcmyst; 04-05-2012 at 21:02.
Reply With Quote
  #2  
Old 06-05-2012, 20:50
zalex_ua zalex_ua is offline
Senior Member
Zabbix Certified Specialist
 
Join Date: Oct 2009
Location: Kyiv, Ukraine. GMT +02
Posts: 1,271
Send a message via Skype™ to zalex_ua
Default

Just a nice article http://www.zabbix.org/wiki/Docs/howto/high_availability
Reply With Quote
  #3  
Old 11-05-2012, 21:08
dotneft dotneft is offline
Senior Member
 
Join Date: Nov 2008
Location: Riga, Latvia
Posts: 686
Send a message via ICQ to dotneft
Default

Additionally you can try these ways:
MMM - http://mysql-mmm.org/
or
http://www.mysqlperformanceblog.com/...-availability/ - it's very interesting way... but i don't know how it works (in production).
Reply With Quote
  #4  
Old 14-05-2012, 23:15
mcmyst mcmyst is offline
Member
 
Join Date: Feb 2012
Location: France
Posts: 69
Default

There is a point that worries me about how people manage automated failover with database in master-master replication ?

I mean, if we look at my first idea, we suppose that the link between the 2 data centers break down : the zabbix primary server will keep running AND the second one will start up !! Here start the problem : 2 zabbix server writing on the database in the same time. When the replication will be able to synchronize (when the link between the 2 places will be ok) the database will be completely corrupted...

Thank you for you replies !

Last edited by mcmyst; 15-05-2012 at 19:45.
Reply With Quote
  #5  
Old 17-09-2012, 10:47
dbarton dbarton is offline
Junior Member
 
Join Date: Aug 2012
Posts: 15
Default

Quote:
Originally Posted by mcmyst View Post
There is a point that worries me about how people manage automated failover with database in master-master replication ?

I mean, if we look at my first idea, we suppose that the link between the 2 data centers break down : the zabbix primary server will keep running AND the second one will start up !! Here start the problem : 2 zabbix server writing on the database in the same time. When the replication will be able to synchronize (when the link between the 2 places will be ok) the database will be completely corrupted...

Thank you for you replies !
Hi

I don't know if this topic is still watched, but I might have some informations to the master-master replication.

If you're running a MySQL master/master (active/active) setup and the link between the both of them will break, there's no "auto" startup of the secondary zabbix server. You've to do that w/ separate tools like heartbeat and pacemaker - most likely you'll already know that

We're talking here about a master/master (active/active) MySQL setup in a active/passive cluster setup. This means the MySQL replication between the active and passive cluster nodes is running all the time, while the Zabbix services are only running on the active cluster node.

Now expect that the required configuration of a cluster (e.g. heartbeat + pacemaker) is already done and everything works as designed. Now the link between both data centers is going down and heartbeat reports a failure on both nodes. At this moment, both nodes are "thinking": "hey I'm alive and my partner's dead" - we call this a split-brain syndrome.

While all services on the active cluster node are still running, the passive cluster node is starting all Zabbix related services. So at this point, both cluster nodes are active and writing to their own DB (now w/o replication).

Now after a while the link between both data centers is coming back online, which means both cluster nodes can see each other again. Depending on the configuration, an alert is sent or a cluster node is going passive again (i.e. shutting down all Zabbix related services). MySQL is keeping up & running on both nodes, since it's not a cluster related service.

And here's where the MySQL replication comes into play. In a replication setup, there's always a master and a slave. The master is logging all transactions (called binary log) and the slave is registered to that binary log. A slave always knows it's position in this log and can catch up w/ the master, even if it's disconnected for a while (e.g. if slave has position 20 and master has position 25, then the slave has to catch up all transactions from 20 to 25). Now in a classic master/slave setup, only the master is allowed to write, because the slave can never be ahead of the master (because it's only one way replication). But in a master/master setup, both MySQL servers are masters and both MySQL servers are slave from each other. So they're able to write both to their databases and catch up, as soon as they're connected again (because every server has it's own binary log).

BUT THIS IS IMPORTANT: If both severs will write to the same tables, then you might have a problem w/ auto-increment values. To avoid that problem, you've to set the following configuration variables on the servers:

server 1:
Code:
auto_increment_increment=2
auto_increment_offset=1
server 2:
Code:
auto_increment_increment=2
auto_increment_offset=2
This will increment all auto-increment fields by 2 instead of 1. On the server 1 you've an offset of 1 (i.e. 27, 29, 31, ...) and on the server 2 you've an offset of 2 (i.e. 28, 30, 32, ...).

Cheers
Domi

Last edited by dbarton; 17-09-2012 at 10:50.
Reply With Quote
  #6  
Old 17-09-2012, 23:48
whowd whowd is offline
Junior Member
 
Join Date: Jul 2007
Posts: 26
Default

Quote:
Originally Posted by dbarton View Post
Hi
server 1:
Code:
auto_increment_increment=2
auto_increment_offset=1
server 2:
Code:
auto_increment_increment=2
auto_increment_offset=2
This will increment all auto-increment fields by 2 instead of 1. On the server 1 you've an offset of 1 (i.e. 27, 29, 31, ...) and on the server 2 you've an offset of 2 (i.e. 28, 30, 32, ...).

Cheers
Domi
Zabbix does not use the database auto_increment functionality. It maintains it's own set of ids and as such this feature in MySQL will not help you.

w
Reply With Quote
  #7  
Old 19-09-2012, 09:13
dbarton dbarton is offline
Junior Member
 
Join Date: Aug 2012
Posts: 15
Default

Quote:
Originally Posted by whowd View Post
Zabbix does not use the database auto_increment functionality.
This isn't true! Zabbix uses auto_increment in some tables!

Quote:
Originally Posted by whowd View Post
It maintains it's own set of ids and as such this feature in MySQL will not help you.
You're right w/ that BUT my recommendation is working perfectly in a active/passive setup like mcmyst mentioned.

On a failover you can switch to the datacenter 2 and start all zabbix services w/o any further configuration (since both MySQL servers are in sync). As soon as datacenter 1 is available again, the MySQL (DC1) will catch up w/ it's slave (DC2) and it will be in sync again - so you shutdown all zabbix services on DC2 and startup all zabbix services on DC1 again.

BUT you're right, there is a "problem": if both data centers running split brain, then my setup might not work because of the Zabbix internal IDs. So if there's a split brain, you've to reset one MySQL...

Sorry didn't was aware of the internal ID management.

Cheers
Reply With Quote
  #8  
Old 19-09-2012, 20:05
mcmyst mcmyst is offline
Member
 
Join Date: Feb 2012
Location: France
Posts: 69
Default

For now I have decided to set up the architecture like this:
http://demo.ovh.com/fr/b50d6770d07e1...d178a5e689d9a/

Do you know what will happen if both proxies are trying to send there data back to the server after a split brain situation ? (proxies in DC1 and DC2 were both running during the split brain)

The idea is to, FIRST, avoid the data loss, even if there is no alert during the failure of the physical node in the DC1 due to the manual failover (if nobody has seen the problem).

And then when this will work correctly, I will place a third node in a third datacenter, and use the "quorum" functionality to avoid the split brain situation (only the part of the cluster with 2 node will start the resources) to set up a 3 node cluster for zabbix server.
Reply With Quote
  #9  
Old 24-09-2012, 17:15
Rawlings Rawlings is offline
Junior Member
 
Join Date: Sep 2012
Location: Paris
Posts: 24
Default

Hello

How do you calculate the volume (for define RAM / CPU /DISK) on your servers . especially for database server

kind Regards

Ps : I Using PostgreSql with Pgpool for HA . This work very good
Reply With Quote
  #10  
Old 28-09-2012, 15:45
dotneft dotneft is offline
Senior Member
 
Join Date: Nov 2008
Location: Riga, Latvia
Posts: 686
Send a message via ICQ to dotneft
Default

You can try MySQK Galera replication. But it needs 3 nodes. Of course, You can create one virtual node (arbiter).
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 04:25.