Ad Widget

Collapse

High availability strategy for proxies

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ryounce
    Junior Member
    • May 2014
    • 1

    #1

    High availability strategy for proxies

    Hi all,

    I am running Zabbix 2.2.3 on my server, proxies, and agents.

    I was curious as to what people consider to be a recommended strategy for implementing a failover/cutover strategy for a distributed setup with one server and multiple proxies. In our situation, the proxies are handling most/all of the monitoring and the node itself just collects the monitoring results from the proxies.

    In the event that a proxy fails or becomes unresponsive, the hosts that are monitored by that proxy will no longer be monitored (at least, that's my take on it from experimenting around with this in a test setup). I imagine their may be a slight data loss associated with this (any data stored in the proxy's DB will not be sent to the server until it becomes responsive again). Ideally, though, those hosts that are monitored by that unresponsive proxy would be switched dynamically to a standby proxy that will continue to monitor until the first proxy is available again (at which time, perhaps manually, we would move the hosts back over to the original master).

    We are currently taking a passive approach (the proxies poll the various agents, no ServerActive setting is being used) so hosts can be easily assigned or switched from one proxy to another. I imagine it might be doable with a proxy heartbeat trigger that activates a script that does the host-to-proxy reassignment, but I wanted to get people's input on this first.

    I've seen some details on HA with regard to Zabbix, but nothing indicating HA with regard to proxies (just the server itself and the database).

    Thanks in advance,
    Ryan Younce
  • js1
    Member
    • Apr 2009
    • 66

    #2
    I've managed Linux HA clusters in the past, and it can probably do what you want with Zabbix. You can either use DRBD (network RAID1) or MySQL replication along with anycasting the Zabbix proxy service address. It can get complicated.

    Having said that, for Zabbix, I'd almost rather KISS. Make the proxy as physically redundant as you can afford. And run active checks with agents so checks can be queued up locally if the proxy goes down.

    Comment

    • mushero
      Senior Member
      • May 2010
      • 101

      #3
      We do this with sets of proxies globally - we fully transfer one proxy's hosts to another proxy in another country.

      We use one of the profile/inventory fields on the host to track/store the 'home' proxy. The we have SQL that updates the hosts to move all hosts on a given proxy to a new one when we are having trouble.

      For example our Hong Kong proxy often has network issues, so we just move all HongKong proxy's hosts to the Japan proxy. When it's fixed, we move them back.

      Done in SQL and the system picks it up on next proxy config pull (creates unreachables, high queue for a while).

      Not perfect, but allows you to move things around. We use direct SQL but you can create PHP pages to do this.

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        Originally posted by ryounce
        Hi all,

        I am running Zabbix 2.2.3 on my server, proxies, and agents.

        I was curious as to what people consider to be a recommended strategy for implementing a failover/cutover strategy for a distributed setup with one server and multiple proxies. In our situation, the proxies are handling most/all of the monitoring and the node itself just collects the monitoring results from the proxies.
        If you are using only active items monitoring data are collected on agent side and is send to proxy/server in batches.
        Temporary interruption with communication between agent and proxy is not a problem as long as this interruption is not longer than period of time when agent buffer with monitoring data will be full.

        With default agent settings you should be able to shutdown proxy and start it again without loosing monitoring data. If proxy will be down up to about 1 min everything still should be OK. Exact time factors here depends how many items per second is monitored on fastest collecting monitoring data zabbix agent.
        Proxy before regular shutdown flushes all not send monitoring data sending them to server.

        With above you don't need to care to much about proxy database content if you need to fail over proxy to new host.

        You can use standard active/standby clustering approach. Such infrastructure does not need anything like shared db or shared storage. Separated proxy DB backends can be working on both active and standby zbx prox nodes.

        If you want to shorten proxy shutdown you can decrease DataSenderFrequency (I'm using DataSenderFrequency=10 but I have 10Gb network and proxy and server on SSDs).

        Just measure how long takes restart of the proxy and according to such time consider change all agents settings allowing them to hold monitoring data when proxy will be down.

        Again: above works only in case zabbix agent active items. SNMP monitoring of passive items is fully affected by proxy shutdown.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        Working...