We run Zabbix 6.4 in dockers in HA mode .
Yesterday both Active and Passive servers crashed and it was left unnoticed. Zabbix Web worked and hosts were shown green. The only indication they were down was a popup in the bottom that disappeared in a sec. Docker containers were also green.
Logs showed cache size errors. We set ZBX_CACHERSIZE =512M and servers were happy again.
But it left us with a question how to make sure we are notified immediately if a Zabbix server service is down. We have HA cluster and we expected that we see alerts immediately if an Active server is down.
What probably happen is that when Active node crashed, control went to a Passive one and it also crashed because of an insufficient cache size. But it has to be time to send an alert, right? So I'm guessing, we did not setup alerting correctly.
How to properly set up HA cluster failure alerts?
Yesterday both Active and Passive servers crashed and it was left unnoticed. Zabbix Web worked and hosts were shown green. The only indication they were down was a popup in the bottom that disappeared in a sec. Docker containers were also green.
Logs showed cache size errors. We set ZBX_CACHERSIZE =512M and servers were happy again.
But it left us with a question how to make sure we are notified immediately if a Zabbix server service is down. We have HA cluster and we expected that we see alerts immediately if an Active server is down.
What probably happen is that when Active node crashed, control went to a Passive one and it also crashed because of an insufficient cache size. But it has to be time to send an alert, right? So I'm guessing, we did not setup alerting correctly.
How to properly set up HA cluster failure alerts?
Keep them up to 40% occupied, then they can probably withstand also some value storms after interrupts...
Comment