Hi, we use Zabbix HA cluster functionality (2 servers) and every time we need to do maintenance on active server, we see ~5 min gap in charts. It looks like HA cluster does not work as expected even though a stand-by server correctly switched to Active according to SystemInfo page . We have fallover delay set to 1 min (which is probably too much) - but a gap is longer - up to 7min in some cases.
So when we bring the first server back again, we see all processes running high, struggling to process queues. All our proxies set both server's IPs correctly according to docs.
I wonder if it is expected behavior ? Would it help if we reduce failover delay down to say, 10 sec? What is the best practice here?
So when we bring the first server back again, we see all processes running high, struggling to process queues. All our proxies set both server's IPs correctly according to docs.
I wonder if it is expected behavior ? Would it help if we reduce failover delay down to say, 10 sec? What is the best practice here?