You are viewing documentation for the development version, it may be incomplete.
Join our translation project and help translate Zabbix documentation into your native language.

2 Proxy load balancing and high availability

Overview

Proxy load balancing allows monitoring hosts by a proxy group with automated distribution of hosts between proxies and high proxy availability.

If one proxy from the proxy group goes offline, its hosts will be immediately distributed among other proxies having the least assigned hosts in the group. Or, if a proxy has too many/too few hosts compared to the group average, group re-balancing by distributing hosts evenly will be triggered.

Host redistribution happens only in online proxy groups. A proxy group is "online" if the configured minimum number of its proxies are online (not offline or unknown).

The minimum number of online proxies should be less than the proxy total in the group. In a group of 10 proxies, setting the minimum online proxy count to 10 creates a situation where the whole group will go offline if only one proxy fails. It is better to have 6 online proxies required. This will support 4 unhealthy proxies.

The proxy state is:

  • online - if there was communication with it for the failover delay period (passive proxy responded to server requests and active proxy sent a request to server);
  • offline - if there was no communication with it for the failover delay period;
  • unknown - after proxy creation or server start.

You can monitor the proxy group state with the zabbbix[proxy group,<name>,state] internal item.

Proxy load balancing and high availability is managed by the proxy group manager process. The proxy group manager always knows which other proxies are healthy or unhealthy.

Version compatibility

  • Only Zabbix agents 7.0 and later are supported for working with proxy groups in active mode;
  • Zabbix pre-7.0 version proxies and the hosts monitored by these proxies are excluded from re-balancing operations until they are upgraded.

Host reassignment

Zabbix server checks the balance between host assignments to the proxies. The group is considered "out of balance" if there is:

  • host excess - a proxy has many more hosts than the group average;
  • host deficit - a proxy has far fewer hosts than the group average.

The group is considered "out of balance" if the number of hosts assigned to the proxy is above/below the group average by more than 10 and a factor of 2. In this case the group is marked by the server for host reassignment after the grace period (10 x failover delay), if the balance is not restored.

The following table illustrates with example numbers when host reassignment is (or is not) triggered:

Number of hosts on proxy Group average Host reassignment
>100 50 Yes
60 50 No
40 50 No
<25 50 Yes
>15 5 Yes
10 5 No

The proxy group manager will re-distribute hosts in proxy groups in the following way:

  • calculate the average number of proxies per host;
  • for proxies with host excess - move excess hosts to unassigned hosts;
  • for proxies with host deficit - calculate the number of hosts needed to balance proxies;
  • remove the missing number of hosts from proxies with most hosts;
  • distribute unassigned hosts between proxies with least hosts.

Configuring proxy load balancing

To configure proxy load balancing for monitoring hosts:

  1. Create a proxy group (see "Configuring a proxy group" below).

For passive checks, all proxies of the group must be listed in the Server parameter of agents.

Adding all proxies of the group to the ServerActive agent parameter (separated by a semicolon) of monitored hosts is beneficial, but not mandatory. An active agent can have a single proxy in the ServerActive field and proxy load balancing will work. When the agent service starts, the agent will receive a full list of all IP addresses of all Zabbix proxies, load and keep into memory. Active checks (and Zabbix sender data requests) will be redirected to the correct online proxy for the host, based on the current proxy-host assignment.

Having only a single proxy in ServerActive field may lead to lost monitoring data if the agent is started/rebooted while that particular proxy is offline.

  1. Make sure the proxy group is online.

  2. Configure that hosts are monitored by proxy group (not individual proxies). You may use host mass update to move hosts from proxy to the proxy group.

Hosts that are monitored by a single proxy (even if the proxy is part of proxy group) are not involved in load balancing/high availability at all.

  1. Wait a few seconds for configuration update and for host distribution among proxies in the proxy group. Observe the change by refreshing the host list in Monitoring -> Hosts.

When a host is created based on auto registration/network discovery data from a proxy belonging to proxy group - then this host is set to be monitored by this proxy group.

Limitations
  • SNMP traps are not supported by proxies in proxy group.
  • Checks depending on external configuration must have the same configuration on all proxies in proxy group. That includes:
    • external checks - scripts;
    • database checks - ODBC configuration.
  • When using the "Database monitoring" item, the DB object/server must have extended permissions.
  • When monitored in proxy group the VMware hosts will be randomly spread between proxies in the group and will cause each proxy to cache all VMware data causing additional load to vCenter.
Possible firewall issues

Agents must always be allowed to reach all proxies at the firewall level. Consider the following scenarios:

  • In Zabbix agent active checks, on agent startup, the first proxy responds and redirects to another proxy. The other proxy is not reachable because of a firewall problem and the communication stops in a state of waiting for the other proxy to respond. The root cause of this situation is that the first proxy knew that the other proxy was healthy for sure. This is not a problem if the first proxy fails; then it will try different addresses configured in the "ServerActive" parameter.
  • The HA setup has been stable for multiple months. Host rebalancing never happens; it is not needed. The agent does not need to validate the "backup" channel to any other proxies. In a failover scenario, it might fail because a firewall was modified half a year ago.

Configuring a proxy group

To configure a proxy group in Zabbix frontend:

  • Go to: Administration → Proxy groups
  • Click on Create proxy group

Parameter Description
Name Enter the proxy group name.
Failover period Enter the period in seconds before failover is executed (1m by default; allowed range 10s-15m).
Time suffixes are supported (e.g., 30s, 1m).
User macros are supported.
Minimum number of proxies Enter the minimum number of online proxies required for the group to be online (1 by default; allowed range 1-1000).
User macros are supported.
Description Enter the proxy group description.
Proxies List of proxies in the group. Up to five proxies can be displayed (as links or in plain text, depending on permissions to the proxy).
This list is displayed when editing an existing proxy group, if there is at least one proxy in the group.