Ad Widget

Collapse

Making the zabbix server redundant

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • walterheck
    Senior Member
    • Jul 2009
    • 153

    #1

    Making the zabbix server redundant

    Hey guys,

    we have been looking at providing a redundant zabbix setup for our environment. We would like to have every tier of the server setup redundant, preferrably with automatic and transparent failover.

    As far as I see it, there are three tiers to the server side of zabbix:

    1) the database. in our case, this is MySQL. That is easily made redundant by having the database replicated using standard MySQL replication

    2) the web frontend. Also, very easy to make redundant by just putting it on 2 different servers.

    3) the zabbix binary. This one is a bit more tricky, and I was wondering how to best achieve redundancy? As far as I now see it, best is probably to install it on two different servers and then using something like keepalived to have automatic failover in case one of the servers dies.
    I searched the forum and the wiki, but many solutions use either old (= for older zabbix releases) or unnecessarily complicated/slow technology (e.g. DRBD)

    Has anybody done something like this?
    Free and Open Source Zabbix Templates Repository | Hosted Zabbix @ Tribily (http://tribily.com)
  • js1
    Member
    • Apr 2009
    • 66

    #2
    Originally posted by walterheck
    3) the zabbix binary. This one is a bit more tricky, and I was wondering how to best achieve redundancy? As far as I now see it, best is probably to install it on two different servers and then using something like keepalived to have automatic failover in case one of the servers dies.
    I searched the forum and the wiki, but many solutions use either old (= for older zabbix releases) or unnecessarily complicated/slow technology (e.g. DRBD)

    Has anybody done something like this?
    You can always use heartbeat to manage the zabbix process. DRBD would only need to be used to sync the configs. A friend of mine uses DRBD on a file server that he manages. A zabbix config directory won't have that much i/o.

    You're also going to need to share an IP address between the nodes that run the zabbix process.

    Comment

    • krimson
      Member
      • Sep 2008
      • 49

      #3
      We use the RedHat clustersuite here. You should be able to do a similar thing with Fedora.

      Ofcourse, you will need shared storage. Also keep in mind that the zabbix server will terminate if the MySQL server becomes unreachable.

      Comment

      • nelsonab
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2006
        • 1233

        #4
        Originally posted by walterheck
        3) the zabbix binary. This one is a bit more tricky, and I was wondering how to best achieve redundancy? As far as I now see it, best is probably to install it on two different servers and then using something like keepalived to have automatic failover in case one of the servers dies.
        I searched the forum and the wiki, but many solutions use either old (= for older zabbix releases) or unnecessarily complicated/slow technology (e.g. DRBD)

        Has anybody done something like this?
        The solution I posted to the wiki was done with an older version but will still work with the current version. Yes DRBD can be slow initially but it does quite well once it's in sync, however if it goes out of sync, ya that can be a problem. The only reason I went with it was due to simplicity, I didn't want to have to rework the DB schema to have unique id's for every row, where inserts on one host were odd and the other were even. Yes a normal run is where one DB is master, but I was looking at master-master replication to allow for complete failover.

        The frontend, there's no real way around it other than using a clustering management program like Linux-HA or Veritas cluster or something else. If you convert yourself to a fully (100%) active items setup then you might be able to get away with two active zabbix servers with a loadbalancer in front of it.

        Good luck!
        RHCE, author of zbxapi
        Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
        Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

        Comment

        • walterheck
          Senior Member
          • Jul 2009
          • 153

          #5
          Hey guys,

          thanks for teh suggestions!

          I was thinking a bit more about this, and thought that we could actually use puppet to keep the config files equal on both servers. If that is enough, it would be a good way to not have to use extra technology, as implementing puppet was on our wishlist anyway

          Then, as long as a virtual IP is used for the server, it shouldn't matter which one is handling the reports from the agents, right?

          Walter
          Free and Open Source Zabbix Templates Repository | Hosted Zabbix @ Tribily (http://tribily.com)

          Comment

          • nelsonab
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2006
            • 1233

            #6
            I retract my earlier comment about it working with a fully active setup, and reinstate what I've been saying all along. You can only have one Zabbix server running at a time.

            If you run two servers and have a 100% pure active setup the agents will then connect to both servers listed in their config file. They will then push data to both servers who in turn are pushing data into the DB. Which DB? The same one, or different ones. If you have a master-master setup with the DB you'll then need to modify the Zabbix schema to add MySQL generated ID's for tables that do not have them such as history, history_str and so forth. One server would then be setup to do even ID'd rows and the other odd. This is required so that MySQL knows which rows to propogate to the other server. Also the behavior of the Zabbix agent may be strange if it's getting two lists of active checks, I don't know if that is even supported or behaves as one would expect. It's possible the agent would get a list of active checks from both servers and then send that doubled up list back to both servers, generating 4 times the data in the DB and MUCH duplication.

            Also when the server does passive checks, both servers are going to be sending the passive checks to agents. This will double the network bandwidth.

            As you can see the main challenge here is the Zabbix server. You need to have only one running at a time and have it running with a virtual IP. Linux HA did a very good job with this. Also the MySQL back end you'll want to do something similar where you have a virtual IP active on on your "master" node. This way you can maintain the same server config file on both systems. Using puppet to update this is not very feasable as the refresh times would have to be rediculous. I think puppet is better capable at this than CFengine however as you'd need to write a script that can poll what your environment is like (which db is master etc) and generate the config accordingly.
            RHCE, author of zbxapi
            Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
            Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

            Comment

            • walterheck
              Senior Member
              • Jul 2009
              • 153

              #7
              Originally posted by nelsonab
              I retract my earlier comment about it working with a fully active setup, and reinstate what I've been saying all along. You can only have one Zabbix server running at a time.
              That was my understanding as well, and I agree with your further explanation. 2 active servers is going to get messy or it's just going to be a lot of work to get it running.

              Originally posted by nelsonab
              Also the MySQL back end you'll want to do something similar where you have a virtual IP active on on your "master" node.
              We are for now satisfied with just having replication for the mysql backend.

              Originally posted by nelsonab
              This way you can maintain the same server config file on both systems. Using puppet to update this is not very feasable as the refresh times would have to be rediculous. I think puppet is better capable at this than CFengine however as you'd need to write a script that can poll what your environment is like (which db is master etc) and generate the config accordingly.
              I wouldn't want to use puppet for updating active master or actually anything related to the HA part. I thought of using it for the config file of zabbix server and maybe even mysql as well.
              Free and Open Source Zabbix Templates Repository | Hosted Zabbix @ Tribily (http://tribily.com)

              Comment

              • NOB
                Senior Member
                Zabbix Certified Specialist
                • Mar 2007
                • 469

                #8
                Hi

                we are using MySQL Master-Master replication, a virtual IP
                for the ZABBIX server which is switched either manually for, e.g.,
                patch updates, or via UCARP software automatically.
                UCARP switches the virtual IP adress, announces it on the network
                and stops/starts the zabbix_server application listening only on the
                virtual IP.

                Be aware, that the agents are not pushing data to two servers if you configure
                them with a line like
                Code:
                Server=10.0.0.1,10.0.0.2
                They just send active check data to the first one, only !
                The other servers are allowed to request data for passive checks, though.

                That's why we use a virtual IP, say, 10.100.47.11 for active checks
                and the two physical IPs of the servers, say, 10.100.47.33 and 10.100.47.34 for the passive checks.
                So the agent configuration contains a line like
                Code:
                Server=10.100.47.11,10.100.47.33,10.100.47.34
                and all works as expected. All active checks are send to the virtual IP,
                wherever it is, the list of active checks will be retrieved from the same
                virtual IP and both servers are allowed to do passive checks.
                It's as easy as that !

                Starting with the 1.6.x agents, the agents will cache data if the virtual IP
                is not available during a switch.

                Instead of using MySQL Master-Master replication you could put
                the DB on external storage. Disadvantages are: another single point
                of failure, the filesystem can get corrupted, ensure that just one
                system has the filesystem mounted all the time.

                HTH and YMMV

                Norbert.
                Last edited by NOB; 07-08-2009, 09:17. Reason: More detailed explanation how it works

                Comment

                • Takanori Suzuki
                  Junior Member
                  • Jun 2008
                  • 11

                  #9
                  Hi, I'm interested in redundant monitoring.
                  Recently, I made a patch for supporting multiple server in active check mode, though it is not accepted yet.
                  I think this feature may help to make active/active redundant monitoring.

                  Comment

                  • qix
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Oct 2006
                    • 423

                    #10
                    Could it be a solution to do a "semi loadbalanced" zabbix server solution?

                    What I mean is the following:

                    You could use multiple zabbix server instances running Linux HA on multiple servers, each with their own virtual IP.

                    The load could then be shared over the physical servers by predefining one zabbix server instance as SNMP, Pinger and IPMI and the other as an Active/passive agent instance.

                    If 1 server fails, the other could failover the missing instance via Linux HA.

                    I haven't tried it, but from what I've seen I think it might work.
                    With kind regards,

                    Raymond

                    Comment

                    • alixen
                      Senior Member
                      • Apr 2006
                      • 474

                      #11
                      Hi,

                      Originally posted by qix
                      You could use multiple zabbix server instances running Linux HA on multiple servers, each with their own virtual IP.
                      Zabbix already supports some kind of load balancing with distributed monitoring.
                      If Zabbix nodes are configured on HA clusters, we get high availability and load balancing.

                      Regards,
                      Alixen
                      http://www.alixen.fr/zabbix.html

                      Comment

                      • qix
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Oct 2006
                        • 423

                        #12
                        Zabbix DM is a bit 'touchy' in my opinion.
                        I don't really think it's stable.
                        Plus, you need to configure all your templates and stuff twice, since it doesn't replicate them...which is a drag (export, import, etc.).

                        If you have an other experience I'd love to hear about it
                        With kind regards,

                        Raymond

                        Comment

                        • misch42
                          Junior Member
                          • Nov 2010
                          • 8

                          #13
                          Hi,

                          Linux-HA is dead. Please consider using pacemaker. See: www.clusterlabs.org

                          Michael Schwartzkopff.

                          Comment

                          Working...