Announcement

Collapse
No announcement yet.

Zabbix database choice, and HA solution

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    Zabbix database choice, and HA solution

    Hello,

    I am currently setting up a monitoring solution for the company where I am working based on zabbix.
    Firstable I have to choose the database engine that will be able to handle aproximatively between 1500 and 2000 nvps. I see lots of you are using mysql+innodb or mysql-percona+xtraDB, often saying that they have chosen mysql for performance reason.

    Is mysql faster than postgresql in a zabbix environment ?

    The second point that I am working on is a HA solution. But for now it would not be a full automated HA solution, I prefer for the beginning an HA solution with a human failover.

    Here is a solution that I would like to set up:
    Datacenter 1:
    - mysql/postgresql server ACTIVE
    - zabbix server ACTIVE
    - zabbix proxies N ACTIVE

    Datacenter 2/
    - mysql/postgresql server ACTIVE
    - zabbix spare server SHUTDOWN
    - zabbix proxies SHUTDOWN

    In case we see that the datacenter 1 have a problem, we just start spare servers manually to retrieve the monitoring solution.

    So questions :
    - Is this kind of architecture is suitable for you ?
    - Why some people are using DRBD instead of mysql/postgresql/oracle replication ?

    EDIT:
    (1.1.1.1 to 1.1.1.4 are ip addresses of the different servers)
    Attached Files
    Last edited by mcmyst; 04-05-2012, 21:02.

    #2
    Just a nice article http://www.zabbix.org/wiki/Docs/howto/high_availability

    Comment


      #3
      Additionally you can try these ways:
      MMM - http://mysql-mmm.org/
      or
      http://www.mysqlperformanceblog.com/...-availability/ - it's very interesting way... but i don't know how it works (in production).

      Comment


        #4
        There is a point that worries me about how people manage automated failover with database in master-master replication ?

        I mean, if we look at my first idea, we suppose that the link between the 2 data centers break down : the zabbix primary server will keep running AND the second one will start up !! Here start the problem : 2 zabbix server writing on the database in the same time. When the replication will be able to synchronize (when the link between the 2 places will be ok) the database will be completely corrupted...

        Thank you for you replies !
        Last edited by mcmyst; 15-05-2012, 19:45.

        Comment


          #5
          Originally posted by mcmyst View Post
          There is a point that worries me about how people manage automated failover with database in master-master replication ?

          I mean, if we look at my first idea, we suppose that the link between the 2 data centers break down : the zabbix primary server will keep running AND the second one will start up !! Here start the problem : 2 zabbix server writing on the database in the same time. When the replication will be able to synchronize (when the link between the 2 places will be ok) the database will be completely corrupted...

          Thank you for you replies !
          Hi

          I don't know if this topic is still watched, but I might have some informations to the master-master replication.

          If you're running a MySQL master/master (active/active) setup and the link between the both of them will break, there's no "auto" startup of the secondary zabbix server. You've to do that w/ separate tools like heartbeat and pacemaker - most likely you'll already know that

          We're talking here about a master/master (active/active) MySQL setup in a active/passive cluster setup. This means the MySQL replication between the active and passive cluster nodes is running all the time, while the Zabbix services are only running on the active cluster node.

          Now expect that the required configuration of a cluster (e.g. heartbeat + pacemaker) is already done and everything works as designed. Now the link between both data centers is going down and heartbeat reports a failure on both nodes. At this moment, both nodes are "thinking": "hey I'm alive and my partner's dead" - we call this a split-brain syndrome.

          While all services on the active cluster node are still running, the passive cluster node is starting all Zabbix related services. So at this point, both cluster nodes are active and writing to their own DB (now w/o replication).

          Now after a while the link between both data centers is coming back online, which means both cluster nodes can see each other again. Depending on the configuration, an alert is sent or a cluster node is going passive again (i.e. shutting down all Zabbix related services). MySQL is keeping up & running on both nodes, since it's not a cluster related service.

          And here's where the MySQL replication comes into play. In a replication setup, there's always a master and a slave. The master is logging all transactions (called binary log) and the slave is registered to that binary log. A slave always knows it's position in this log and can catch up w/ the master, even if it's disconnected for a while (e.g. if slave has position 20 and master has position 25, then the slave has to catch up all transactions from 20 to 25). Now in a classic master/slave setup, only the master is allowed to write, because the slave can never be ahead of the master (because it's only one way replication). But in a master/master setup, both MySQL servers are masters and both MySQL servers are slave from each other. So they're able to write both to their databases and catch up, as soon as they're connected again (because every server has it's own binary log).

          BUT THIS IS IMPORTANT: If both severs will write to the same tables, then you might have a problem w/ auto-increment values. To avoid that problem, you've to set the following configuration variables on the servers:

          server 1:
          Code:
          auto_increment_increment=2
          auto_increment_offset=1
          server 2:
          Code:
          auto_increment_increment=2
          auto_increment_offset=2
          This will increment all auto-increment fields by 2 instead of 1. On the server 1 you've an offset of 1 (i.e. 27, 29, 31, ...) and on the server 2 you've an offset of 2 (i.e. 28, 30, 32, ...).

          Cheers
          Domi
          Last edited by dbarton; 17-09-2012, 10:50.

          Comment


            #6
            Originally posted by dbarton View Post
            Hi
            server 1:
            Code:
            auto_increment_increment=2
            auto_increment_offset=1
            server 2:
            Code:
            auto_increment_increment=2
            auto_increment_offset=2
            This will increment all auto-increment fields by 2 instead of 1. On the server 1 you've an offset of 1 (i.e. 27, 29, 31, ...) and on the server 2 you've an offset of 2 (i.e. 28, 30, 32, ...).

            Cheers
            Domi
            Zabbix does not use the database auto_increment functionality. It maintains it's own set of ids and as such this feature in MySQL will not help you.

            w

            Comment


              #7
              Originally posted by whowd View Post
              Zabbix does not use the database auto_increment functionality.
              This isn't true! Zabbix uses auto_increment in some tables!

              Originally posted by whowd View Post
              It maintains it's own set of ids and as such this feature in MySQL will not help you.
              You're right w/ that BUT my recommendation is working perfectly in a active/passive setup like mcmyst mentioned.

              On a failover you can switch to the datacenter 2 and start all zabbix services w/o any further configuration (since both MySQL servers are in sync). As soon as datacenter 1 is available again, the MySQL (DC1) will catch up w/ it's slave (DC2) and it will be in sync again - so you shutdown all zabbix services on DC2 and startup all zabbix services on DC1 again.

              BUT you're right, there is a "problem": if both data centers running split brain, then my setup might not work because of the Zabbix internal IDs. So if there's a split brain, you've to reset one MySQL...

              Sorry didn't was aware of the internal ID management.

              Cheers

              Comment


                #8
                For now I have decided to set up the architecture like this:
                http://demo.ovh.com/fr/b50d6770d07e1...d178a5e689d9a/

                Do you know what will happen if both proxies are trying to send there data back to the server after a split brain situation ? (proxies in DC1 and DC2 were both running during the split brain)

                The idea is to, FIRST, avoid the data loss, even if there is no alert during the failure of the physical node in the DC1 due to the manual failover (if nobody has seen the problem).

                And then when this will work correctly, I will place a third node in a third datacenter, and use the "quorum" functionality to avoid the split brain situation (only the part of the cluster with 2 node will start the resources) to set up a 3 node cluster for zabbix server.

                Comment


                  #9
                  Hello

                  How do you calculate the volume (for define RAM / CPU /DISK) on your servers . especially for database server

                  kind Regards

                  Ps : I Using PostgreSql with Pgpool for HA . This work very good

                  Comment


                    #10
                    You can try MySQK Galera replication. But it needs 3 nodes. Of course, You can create one virtual node (arbiter).

                    Comment


                      #11
                      I always choose Postresql over Mysql if I have the choice. I recommend the same to you.

                      Comment

                      Working...
                      X