Ad Widget

**rivermigue** · 28-01-2025, 21:25

Anyone has a suggestion? I have the secondary master in standby but doesnt seem to be of much help when the failover happens since its like if it isnt accepting the metrics from the proxies for some reason.

**cyber** · 29-01-2025, 08:38

If you restart any of the proxies after failover, will those start to send data?
Have you tested with a bit more conventional way of setting both server addresses as cluster to proxy settings, bypassing that F5 there.. ie: "Server=zbx-master-1;zbx-master-2"

**markfree** · 30-01-2025, 19:55

I've seen proxies that show up as active, have no errors logged, but are not sending any data to the server due to a time mismatch.
Check that all hosts are synchronized.

**rivermigue** · 12-02-2025, 00:49

1) Restarting the proxies after the failover does not make them send data to the new master
2) I removed the load balancer IP and specified in each proxy the list of zabbix masters separated by semi colons since these are active proxies, same result, proxies are not sending the data although they show up as online in the web ui, the queues by proxies just keeps increasing and never catches up.
3) Both zabbix masters are in CST and one proxy in PST, time is in sync.
4) The database is in the same datacenter as zbx-master-1, however we do have another db instance in the same datacenter where zbx-master-2 which acts as our last resort for a failover, the latency between both DC is around 45ms, when I failover zabbix, I am not taking in consideration the database, would a latency of 45ms cause something like this?
5) Selinux is permissive

The strange part to me is that proxies appear online in the UI and are also able to get config updates from both masters when failing over as that comes up in each zabbix proxy log, the statistics in the zabbix master seems to show that is processing some data in the pollers but yet, they seem to never clear the queue and we just start to receive a bunch of alerts unless we failover back to the original master.

**rivermigue** · 25-02-2025, 19:36

After many thoughts around this, I found the cause for this behavior.
Our PSQL cluster has nodes in both datancers as well as zabbix, datacenters are separated by ~40ms in latency, if we failover zabbix master to DC2 while the psql master is in DC1, this is when we start seeing all these problems with items getting queued and never making its way to the zabbix master for some reason, this gets resolved by having both masters (psql and zabbix) in the same datacenter.
I dont know if this is true for setups with fewer items, but at least for us it is and I am not sure if this is logged somewhere in the zabbix logs? Anyway, this is resolved for us.

Ad Widget

Monitored items start to fail after zabbix server failover in HA setup

Monitored items start to fail after zabbix server failover in HA setup

Comment

Comment

Comment

Comment

Comment