Ad Widget

Collapse

Distributed monitoring and failover

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • qix
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2006
    • 423

    #1

    Distributed monitoring and failover

    Hello all,

    I'm trying to figure out how I can use a distributed monitoring setup to ensure redundancy.

    We have 2 data centers and both locations need to get their own Zabbix server.
    I've been planning to use distributed monitoring for this and I think I know how to set this up.
    One server will me the master node and the other will be the slave node, thus easing the administration by using the master node.

    This way, if the connection between the locations gets severed for some reason, monitoring will still continue to work.

    What I would like to do, is set up a mechanism so I can use either server as a fall back for the other in case of an hardware failure (let's assume we don't run into both problems at once ).

    The best solution I can come up with is to set up MySQL replication of the databases. The database on the master node gets replicated to the slave node and vice versa.

    I think I will need to use a different MySQL process for this. So I was thinking of using a process running on port 3306 for the master node and port 3307 for the slave node.

    Then I will need to setup 2 zabbix config files on both servers, one with node ID 1 and the database running on port 3306 and one with node ID 2 and the database running on port 3307.

    In the init scripts I would start both MySQL instances and just the primary zabbix process for the specific machine. If there is a failure on one of the zabbix servers or it needs to be brought down for maintenance, I can just activate the second zabbix process on the remaining machine.

    When the original machine comes back up, I need to replicate the SQL data back to this machine and restore the state of all the processes.

    Will this setup work? Is there an easier way, perhaps master-master replication of just the master node database? (saves a lot of space)
    Is there anybody on the forum that has experience with a fail over scenario for Zabbix?

    Any hints and tips on how I could achieve this would be most welcome!

    Thanks in advance,
    With kind regards,

    Raymond
  • qix
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2006
    • 423

    #2
    *bump*

    Is there anybody who can give me a few pointers?
    Alexei, can you tell me what you would recommend?
    With kind regards,

    Raymond

    Comment

    • qix
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2006
      • 423

      #3
      *bump*

      I really need some input from somebody that has done something like this before.

      Thanks in advance,
      With kind regards,

      Raymond

      Comment

      • just2blue4u
        Senior Member
        • Apr 2006
        • 347

        #4
        please see my post http://www.zabbix.com/forum/showpost...13&postcount=9
        in this thread: http://www.zabbix.com/forum/showthread.php?t=4104

        My model shows 2 independend Zabbix instances. I didn't test it, but maybe this works with master/slave node model combined?
        Big ZABBIX is watching you!
        (... and my 48 hosts, 4513 items, 1280 triggers via zabbix v1.6 on CentOS 5.0)

        Comment

        • NOB
          Senior Member
          Zabbix Certified Specialist
          • Mar 2007
          • 469

          #5
          Hi,

          we have the same scenario (more or less), i.e. two Datacenters (or more).
          With more datacenters you have to distribute the servers more intelligently over the datacenters to cover for HW problems of one server.

          One of our main concerns is not just a HW problem with one
          of the servers but a complete loss of a data center, too.
          This has happened in the past and we want to have a monitoring solution
          covering both cases.

          So my proposal is to use the following:

          Build up one active ZABBIX server per datacenter with a passive ZABBIX server in the other datacenter.
          Use a virtual IP-address which will get switched (either automically
          or manually) in case of a HW problem and MySQL replication between these
          two servers. The ZABBIX server processes are switched, too, if necessary.
          All these ZABBIX servers are completely installed, i.e. with the Web frontend.
          Instead of DB replication one could use mirrored SAN connections
          as well, but this is expensive and the availability as well as the performance might be lower than using local (mirrored/striped) disks.

          So, that means four servers will cover the case of loss of datacenter and a HW problem in one of the servers.

          For convenience of our operation people (one view for all monitored servers/services) we are thinking about adding one global ZABBIX master
          server which will gather the data from the ZABBIX servers per datacenter.
          Of course, for redundance we need two of those as well but this
          just for reasons of HW problems.
          If one datacenter is lost, the operation people just use the frontend
          in the other (remaining) datacenter or use the virtual IP until
          the first datacenter is back up, again.

          As always, the tricky part is to get the switching of the application and virtual IP address right.
          I've seen several cases where: either both servers were trying to be active
          or the first server went down and a small amount of time later the other
          went down, too ...

          This should work, I hope.

          What is your opinion ?

          I know my proposal is not complete. Of course, you want to define triggers covering systems in both data centers - like distributed clusters.
          This has to be done on one ZABBIX server getting the data from servers
          in both datacenters. For this purpose even more ZABBIX servers (active
          and passive ones) are required. But those do not necessarily need a
          complete installation with Apache, PHP and all that stuff for the frontend.

          In addition, if you want use distributed monitoring inside the datacenters
          to cover several customers with own networks not directly reachable from
          the central ZABBIX servers it gets even more complicated.

          But to propose a solution for that is our work, isn't it !
          ZABBIX 1.6 (one GUI for all servers, latest data included)
          will help solving this, I hope.

          Regards

          Norbert.

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            I am not sure that anything has to be done by ZABBIX software itself. All this can be achieved by database replication, virtual IPs and (or) using cluster solution for switch over and high availability.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • NOB
              Senior Member
              Zabbix Certified Specialist
              • Mar 2007
              • 469

              #7
              Originally posted by Alexei
              I am not sure that anything has to be done by ZABBIX software itself. All this can be achieved by database replication, virtual IPs and (or) using cluster solution for switch over and high availability.
              Yes, your are right.

              Except for the single point mentioned:

              All latest data in the central, global ZABBIX-Server to allow just one frontend
              for all servers !

              And, what would be a big plus:

              Remove the requirement to have an Apache / PHP frontend on every server just to configure the master/slave relationship. This can be done by using
              a scripts which does the same the frontend would do.

              Regards,

              Norbert.

              Comment

              • Alexei
                Founder, CEO
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Sep 2004
                • 5654

                #8
                Originally posted by NOB
                Yes, your are right.

                Except for the single point mentioned:

                All latest data in the central, global ZABBIX-Server to allow just one frontend
                for all servers !

                And, what would be a big plus:

                Remove the requirement to have an Apache / PHP frontend on every server just to configure the master/slave relationship. This can be done by using
                a scripts which does the same the frontend would do.

                Regards,

                Norbert.
                The central ZABBIX server has all data available. It is ONE frontend for all servers.

                Currently the GUI is required for initial configuration of nodes only. It is quite straight forward to automate installation of nodes without the GUI as well. It just requires population of table 'nodes', nothing else.

                ZABBIX 1.6 will support GUI-less installation (autoregistration?) of nodes. There are already many significant improvements made in the latest code related to DM.
                Alexei Vladishev
                Creator of Zabbix, Product manager
                New York | Tokyo | Riga
                My Twitter

                Comment

                • qix
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2006
                  • 423

                  #9
                  Thanks for the reply all.
                  I'm afraid i cannot use virtual ip's because the subnets on each location are different.

                  What I have conceived is the following setup (see attached picture).
                  The Primary server is node 1, the secondary server is node 2. So this is a distributed setup.
                  I will use a third server where my databases are being replicated to (master-slave).

                  This will allow me to make backups of our (large) databases (+40GB / 2.5GB compressed) without database locks on the zabbix servers.

                  Secondly, this also allows me to schedule detailed reports generated from the zabbix database without performance loss on the monitoring servers.

                  Thirdly, when the s*** hits the fan, I can always use the reporting server as a spare zabbix server in case of failures, the database is already there, so it should be easy to get it up and running.

                  So failover isn't automatic, but that doesn't really matter at this point.
                  If the need arrises, maybe we will go to 4 servers so there is a spare zabbix server on each site, then automatic failover could be achieved. (I'm thinking VMware here )

                  I'll try to keep you posted on how things are doing when I'm finished.
                  Attached Files
                  With kind regards,

                  Raymond

                  Comment

                  Working...