Ad Widget

Collapse

Zabbix Multi Site HA

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • syndeysider
    Senior Member
    • Oct 2013
    • 115

    #1

    Zabbix Multi Site HA

    Hi

    We are in the process of "moving to the cloud". We are a bit sceptical of our current provider of infrastructure in the cloud and have chosen to not host a set of core services "in the cloud" but rather in make shift 2 x rack Datacenters at two sites. This includes the monitoring infrastructure.

    I have 2 x Dell R710's with some sexy hardware (PCIe SSD's etc.)
    2400 NVPS
    1200 Hosts

    The idea is to Setup an Active/Passive Cluster running Zabbix 2.4.1-2, MySQL 5.6 and Apache.

    ActiveNode = Datacenter 1
    PassiveNode = Datacenter 2

    Both Dataecenters are part of a 10Gbps Fibre Ring Network and are about 50Km's apart.

    Has anyone managed to successfully setup Master/Slave replication with automatic fail-over across sites with the Zabbix DB?
    What's the performance like across sites?
    Do I set up a read-only SLAVE?

    I've done a fair amount of searching and most of the forum's point to shared storage. I have no experience with DRBD so I'm not going that route.
  • timbo
    Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2013
    • 50

    #2
    I believe this gentleman has done something similar:
    One of the questions for those of us that use Zabbix on a large scale is “Just how much data can Zabbix ingest before it blows up spectacularly?” Some of the work I’ve been doing lately revolves around that question. I have an extremely large environment (around 32000+ devices) that could potentially be monitored entirely […]


    I think the gist was you cannot have a "slave" service/server. The "slave" Zabbix service needs to be stopped, then started when the "primary" Zabbix service/server is down (no heartbeat). I think he developed some scripts to automate this.

    It has been a while since I've visited the IRC channel, but I'm pretty sure I saw him in there a couple of times (though his IRC handle escapes me at the moment).

    Hope that gives you a litte more info to work off of.

    I would love to hear the progress you make on this, please keep us posted.

    -Timbo

    Comment

    • innovot
      Junior Member
      • Nov 2013
      • 15

      #3
      Very interested to follow this thread as we have a similar requirement. We have two r420 at disparate locations with iSCSI storage that is capable of being replicated between sites. If the DB and configuration files were synchronized between the two then would both be able to run ? As we run an IPSEC tunnel between the two we would want to have site A monitoring site A, site B and monitoring B, but Site A monitoring key site B systems and vice versa. Then we would know if either site were to fail but still retain a single set of configuration and data. Is that possible ?

      Comment

      • mmester
        Junior Member
        • Jan 2015
        • 2

        #4
        I am also interested in doing this. We have a very large environment with multiple datacenters. Putting a critical system in production without HA is not an option. We currently partition each monitoring install per datacenter location. We lose the centralization capability there, but also limit our exposure to a failure.

        It would be nice to centralize the system as a whole and use proxies at the remote datacenters. To do that we need HA for the central servers though.


        -Mike

        Comment

        • syndeysider
          Senior Member
          • Oct 2013
          • 115

          #5
          Hi Guys

          So I've finally completed the setup of a Redhat 7 (pacemaker/corosync) cluster.

          Active
          • mysqld - master
          • zabbix-server - master
          • dbmaster-vip
          • symlinked cronjobs confs etc. from git repo


          Passive
          • mysql - slave
          • dbreader-vip


          I am about to start the migration later this month and will write-up a "how to" blog post on the issues I encountered etc. It wasn't easy to start off with as I used pcs instead of crm and there are not as many tutorials out there for the new commands structure etc.

          I also ran into some database structure issues e.g. I got two different partitions of which I wanted the history/trends etc. tables to run off the SSD's and the rest of the DB to run of the SAS disks. I had to create new tables and migrate data to the new partition structure etc. Had to also update how I managed table partitioning.

          I've tested fail over (30 second outage) and "large load of queries" and my master/slave handle fine.

          I'll try complete this by end of February and let you know how I go.

          Comment

          • timbo
            Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2013
            • 50

            #6
            Thanks for updating us on your progress.

            I look forward to seeing a few more details on this project when you have the time (as I may be in the same boat before too long).

            -Timbo

            Comment

            • wdijkerman
              Junior Member
              • Jan 2015
              • 18

              #7
              I don't know if this still applies, but the setup you mentioned is also used in an complete chapter in this book: Mastering Zabbix (PacktPub).

              (This post is not used for spam purpose.) I bought this book a while ago and what I can remember, this is written very clear and understandable. Maybe this will help you configure your environment.

              Comment

              • mushero
                Senior Member
                • May 2010
                • 101

                #8
                We are building this but in two locations, in Shanghai and Beijing.

                First step is get all monitoring on proxies as then our field hosts (in hundreds of locations) don't need to be touched for the public IPs they talk to (they are locked in iptables and the agent config, most years ago).

                Second, build a failover site with a full stack of Zabbix. The Web & Apps must not be running, but startable, and the DB is in slave mode to the master.

                Third, we fail manually for now by starting the DR Web/App, breaking replication, and then manually changing all the proxies to the RD app server. Not pretty but easy to understand and do.

                Of course all of this can be automated, but this system has run 7 years without real issues so not a very common occurrence and we can do it in the day time in an hour once the real issues are confirmed - of course could automate to a few minutes or less if you wanted.

                Key is not auto failing that DB faster than you can confirm you've really lost the master or else you have a 1TB DB to sync back up over the internet from backups, which is not that much fun.

                Steve

                Comment

                Working...