Ad Widget

**js1** · 30-09-2014, 05:31

I've managed Linux HA clusters in the past, and it can probably do what you want with Zabbix. You can either use DRBD (network RAID1) or MySQL replication along with anycasting the Zabbix proxy service address. It can get complicated.

Having said that, for Zabbix, I'd almost rather KISS. Make the proxy as physically redundant as you can afford. And run active checks with agents so checks can be queued up locally if the proxy goes down.

**mushero** · 03-11-2014, 05:28

We do this with sets of proxies globally - we fully transfer one proxy's hosts to another proxy in another country.

We use one of the profile/inventory fields on the host to track/store the 'home' proxy. The we have SQL that updates the hosts to move all hosts on a given proxy to a new one when we are having trouble.

For example our Hong Kong proxy often has network issues, so we just move all HongKong proxy's hosts to the Japan proxy. When it's fixed, we move them back.

Done in SQL and the system picks it up on next proxy config pull (creates unreachables, high queue for a while).

Not perfect, but allows you to move things around. We use direct SQL but you can create PHP pages to do this.

**kloczek** · 03-11-2014, 20:26

Originally posted by ryounce

Hi all,

I am running Zabbix 2.2.3 on my server, proxies, and agents.

I was curious as to what people consider to be a recommended strategy for implementing a failover/cutover strategy for a distributed setup with one server and multiple proxies. In our situation, the proxies are handling most/all of the monitoring and the node itself just collects the monitoring results from the proxies.

If you are using only active items monitoring data are collected on agent side and is send to proxy/server in batches.
Temporary interruption with communication between agent and proxy is not a problem as long as this interruption is not longer than period of time when agent buffer with monitoring data will be full.

With default agent settings you should be able to shutdown proxy and start it again without loosing monitoring data. If proxy will be down up to about 1 min everything still should be OK. Exact time factors here depends how many items per second is monitored on fastest collecting monitoring data zabbix agent.
Proxy before regular shutdown flushes all not send monitoring data sending them to server.

With above you don't need to care to much about proxy database content if you need to fail over proxy to new host.

You can use standard active/standby clustering approach. Such infrastructure does not need anything like shared db or shared storage. Separated proxy DB backends can be working on both active and standby zbx prox nodes.

If you want to shorten proxy shutdown you can decrease DataSenderFrequency (I'm using DataSenderFrequency=10 but I have 10Gb network and proxy and server on SSDs).

Just measure how long takes restart of the proxy and according to such time consider change all agents settings allowing them to hold monitoring data when proxy will be down.

Again: above works only in case zabbix agent active items. SNMP monitoring of passive items is fully affected by proxy shutdown.

Ad Widget

High availability strategy for proxies

High availability strategy for proxies

Comment

Comment

Comment