Ad Widget

Collapse

High Availability Zabbix

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nelsonab
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2006
    • 1233

    #1

    High Availability Zabbix

    Well after much work I've had a breakthrough! As I write this I have a two node Zabbix-HA cluster up and running with failover! I have to do some more testing along with blow out the configuration and restart from scratch so I can document a full end to end procedure. I hope to have something posted in the Wiki in about a week or two. I still have some things to iron out with the MySQL backend and fully test an agent, but things are looking very good! I hope to also have the Apache frontend included but have not done so yet.

    I've been updating my blog on here and I will continue to do so in the mean time.
    RHCE, author of zbxapi
    Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
    Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

  • noxis
    Senior Member
    • Aug 2007
    • 145

    #2
    Interesting challenge. I would expect as long as you can control the database access in the master-master configuration using HSRP or a hardware loadbalancer would make things very simple.

    Edit: In fact if the assumption is made that there are no active checks then HA should be very simple indeed.

    Comment

    • nelsonab
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2006
      • 1233

      #3
      True, but the HA setup I have should also work with active checks. Only one server process is running at a time with the HA framework taking care of IP sharing/failover, then bringing up Zabbix and Apache which require the shared IP as they both bind to that specific IP.

      I'll know more tomorrow when I get into full testing. :-)
      RHCE, author of zbxapi
      Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
      Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

      Comment

      • noxis
        Senior Member
        • Aug 2007
        • 145

        #4
        Well good luck with the work I am watching with interest. In my mind a Enterprise level monitoring solution should be HA by default.

        Comment

        • nelsonab
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2006
          • 1233

          #5
          It works. :-D

          If I get a chance this week I hope to post up some stuff on this. I still have one last piece to sort out and that is "soft" failover. IE failing over when there is a network level failure such as one node losing it's link. Otherwise I feel confident about this being ready for near prime time. :-)

          However for now I have database replication working using DRBD with Linux-HA controlling the whole stack. I can pull the power on the master node and within 30 seconds all of Zabbix is back up and running on the "secondary" node. I can reduce the failover time if I reduce my timeouts. I would think 10-15 seconds would be the lowest "safe" timeouts for failover. Anything lower might result in too many false positives and inadvertent fencing.
          RHCE, author of zbxapi
          Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
          Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

          Comment

          • cstackpole
            Senior Member
            Zabbix Certified Specialist
            • Oct 2006
            • 225

            #6
            How goes the documentation? I am very interested in seeing how you have set this up.

            Thanks for your hard work!
            Have Fun!
            ~S~

            Comment

            • nelsonab
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2006
              • 1233

              #7
              I haven't had the chance to work on them much, things have been a lot busier than I expected.

              I've attached a Visio of the current setup. Most of my personal systems tend to use Simpsons names. :-)

              I'm also working on a 4 node version of this cluster where the Zabbix Frontend and Server run on a node separate from the back end. That setup is proving to be a little tougher. :-)
              Attached Files
              RHCE, author of zbxapi
              Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
              Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

              Comment

              • bbrendon
                Senior Member
                • Sep 2005
                • 870

                #8
                Why are you using DRBD and not mysql replication?
                Unofficial Zabbix Expert
                Blog, Corporate Site

                Comment

                • nelsonab
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2006
                  • 1233

                  #9
                  Faster development time, plus I didn't want to have to deal with MySQL Master-Master index collisions which I did have initially when I first started. I'm going to try to setup a Master-Slave promotion configuration.
                  RHCE, author of zbxapi
                  Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                  Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                  Comment

                  • rthomson
                    Junior Member
                    • May 2008
                    • 11

                    #10
                    I've got the MySQL portion of Zabbix running on my Linux-HA cluster, also using DRBD instead of MySQL replication. I'm not sure if I'll move the zabbix server over to the cluster or not yet but it seems like an interesting possibility. I think I'll need one more physical server to join the cluster before that makes sense though (currently at two).

                    Did you do anything special except for use the LSB init scripts and setup any necessary groups/constraints? Any noteworthy configurations that wouldn't be more or less obvious to someone with some Linux-HA experience?

                    By the way, I've also got Zabbix monitoring my Linux-HA cluster with some handy shell scripting as my HA packages from CentOS didn't include hbagent (SNMP) for whatever reason...

                    Comment

                    • nelsonab
                      Senior Member
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2006
                      • 1233

                      #11
                      The only "crazy" think I did was write an OCF script to handle Zabbix rather than use LSB. That and setup a back end link to prevent split brain.
                      RHCE, author of zbxapi
                      Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                      Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                      Comment

                      • rthomson
                        Junior Member
                        • May 2008
                        • 11

                        #12
                        Thanks for sharing that. Any specific reason you decided to write an ocf instead of using the lsb script?

                        Comment

                        • nelsonab
                          Senior Member
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Sep 2006
                          • 1233

                          #13
                          I wanted to be able to do full availability testing of the Zabbix server through the monitor facility. I don't do it now, but eventually I want to be able to do it. This way if the server stops responding appropriately I want to fail it over.
                          RHCE, author of zbxapi
                          Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                          Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                          Comment

                          • sunnyfedora99
                            Junior Member
                            • May 2010
                            • 2

                            #14
                            Any Luck

                            Hi NelsonLab,

                            Any luck on the documentation of how you set up the HA with Zabbix. Could do with it.. Urgent.
                            Even a quic step by step.

                            Comment

                            • 0siris
                              Member
                              Zabbix Certified Specialist
                              • Nov 2010
                              • 76

                              #15
                              Right now, we're busy with this too.
                              We've chosen for DRBD, you can find nice info overhere.
                              Our setup is an asynchronous, since it's OK to have some downtime / miss some information, and synchronous updating may slow things down.

                              I thinkt it might even be possible to get everything up and running using DRBD's Java Management Console (download from here).

                              For now, I have a running testbed that can failover, but sometimes won't for some @#)(!@ reason, and I still don't know what causes that. I'm also having troubles adding other services to DRBD besides MySQL (e.g. Apache and Zabbix server/agent).

                              As soon as I got that figured out, it'll go into production

                              Comment

                              Working...