Ad Widget

Collapse

High Availability

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nmail_uk
    Member
    • May 2009
    • 65

    #1

    High Availability

    I did a search of the forums and saw that somebody had been running some development for Zabbix HA but this required the 2 ZABBIX servers to be able to share an IP address.

    My setup is basically provided by a hosting company with 2 CentOS servers set up physically identical (just with different IP addresses.) I have no control over this network, so I cannot transfer a shared IP between these servers, each server has to keep its own.

    Before I migrated to ZABBIX I was going to try and get this running in Nagios but never got around to it. Nagios's way of thinking was to disable active checks on the slave, set up a script to copy each check result over to the slave as they came in (Nagios doesn't use a centralised database so the slave was always up-to-date.)

    If the Nagios slave detects the master has failed, it can then enable active checks on itself and as soon as it sees the master, disable them again.

    I'm thinking I can set up an agent check on my ZABBIX slave server to monitor the master. If the master goes down, the slave can then run a client-side script through the agent to start the ZABBIX server.

    Similarly as soon as the master comes back up, it can stop the ZABBIX server again.

    My question is, does this sound feasible? Or is there an easier way of doing this? Can two instances of the ZABBIX server happily co-exist in the same database. I'm talking 2 stand-alone instances (i.e. both node 0) - the idea being that 99% of the time only one will be running at the same time, but there may be a small window when they both run together until the failover detection kicks in.

    My other question is, if the above is feasible, is it possible to utilise the slave server while it's lying dormant to run some of the master's checks on its behalf? (Perhaps using the proxy?)

    Any thoughts?

    Many thanks,
    Andy
  • nelsonab
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2006
    • 1233

    #2
    I don't think it would be that easy to do what you are wanting to do with Zabbix. When I looked into the HA solution I ran into many of the same questions. In the end it seemed too complicated to have a master/slave zabbix server setup. Yes using a shared IP address can seem complicated, however if both servers sit on the same network segment, using a shared IP becomes very trivial (If IT/Net Eng tells you otherwise politely suggest they go back to networking 101). Yes if you are on different subnets or geographically seperated then things can become much more complicated.

    The main reason I settled on using a shared IP was that I'm not sure about how the clients work should you give them two servers. Do they push active checks to both systems? How reliable is this? Also back end configuration becomes more complicated, if you make a change one server how do you propagate it? Yes you could use MySQL Master/Master but what do you do when the client sends data point A to server 1 and data point A(+1sec) to server 2? Alerting and triggering becomes another area to question. What if you have a trigger that activates an action which kicks a process such as apache on a server? What happens when both servers try to do this at the same time or a few seconds appart from each other?

    Bottom line Zabbix HA is needed, but right now one of the best ways to do it is to use a framework like Linux HA and a single shared IP. It becomes the job of Linux HA to keep only one Zabbix server running on either server and to keep the storage backend synchronized either through MySQL Master/Slave or DRBD.

    Hope this helps give you some insight and things to consider. :-)
    RHCE, author of zbxapi
    Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
    Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

    Comment

    • nmail_uk
      Member
      • May 2009
      • 65

      #3
      Many thanks for your detailed response, nelsonab. It was of course yourself I referred to as the "somebody" on the forums

      My main restriction seems to be the shared IP addresses. The problem is I'm using a third party company to provide the servers. I can get another IP address (which would be on the same subnet) but it would be assigned to one or the other of the servers, so I don't really want to mess around re-assigning IP addresses on their network as I don't know what systems they run in their DC that may be affected if I do.

      The database isn't that much of an issue for me as I have a separate PostgreSQL database server (which is replicated); my 2 servers in question are purely ZABBIX servers (they don't even host the frontend, that's on a dedicated, separate box - this is one of the reasons I love ZABBIX and hated Nagios, the way you can separate all the components!)

      I did think it would be a bit of a complicated setup. I might have to put the question to my hosting provider; something I was hoping I wouldn't have to do; I'm sure they see £££ when they sense clustering!

      Many thanks,
      Andy

      Comment

      • nelsonab
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2006
        • 1233

        #4
        Try going the shared IP route. It simplifies a lot of things with the Zabbix server. Most networking equipment should not be affected by it unless they have some ACL's or other security features on the network limiting which MAC accress is associated with an IP address. There is one thing to consider however if you do go the shared IP route you should make sure both servers are interconnected directly to each other or use a different path than the main interface. Two ways to accomplish this are a null modem serial cable between the boxes (best overall) or a crossover network cable between the boxes and a seperate private IP space for this interconnect. This is to help ensure "split brain" does not occur. If you really want to go for the cou de gras then you will plug both servers into a network controlled switching PDU where one host can physically power off the other host aka STONITH (Shoot The Other Node In The Head).
        RHCE, author of zbxapi
        Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
        Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

        Comment

        • js1
          Member
          • Apr 2009
          • 66

          #5
          If you're going to use Linux HA to make Zabbix highly available, then consider setting it up with a quorum node, which will prevent the split brain scenario.

          With Linux HA in quorum configuration, geographic separation isn't a problem, and is almost encouraged.

          Comment

          • nmail_uk
            Member
            • May 2009
            • 65

            #6
            @nelsonab: That would be the ideal scenario unfortunately I don't have any more control over the boxes than SSH (as they're remotely hosted and not even my own hardware.) They're actually virtual machines anyway.

            I put the question to my provider and quite pleasantly surprised they've agreed to do this and only charge an hour's engineer time to it up.

            What they're going to do is install VRRPd on both servers, set up a shared IP address that points to the primary node - all my Zabbix agents will connect to this shared IP. If the primary node goes down, the secondary will detect this and re-route the shared IP to itself.

            They've said obviously that if the server remains up but Zabbix crashes, VRRPd can't help me here, so I'm hoping to be able to set Zabbix up so it can check the health of the service on the master node and if it fails (but the agent is still up) it can attempt to restart the service; if that fails, it can power down the server to force the IP fail-over.

            As far as inter-connecting the nodes, they will both be on the same VLAN, with a separate IP address each on this VLAN.

            Comment

            Working...