Hi all.
I've tried to setup an alert to get early advice if a host is down. To do so, I've used the Zabbix agent on {HOST.NAME} is unreachable for 5 minutes trigger, and set up an action to notify me in several steps.
This is working fine, but this morning, for some reason, the Zabbix server stopped receiving Zabbix agent pings, which caused a flood of alerts, since every host was marked as down. I had to restart the server and then clear the alerts and escalations tables.
So here's the thing: I want to change my message action so that if more that two hosts appear as unreachable no message is sent--or even better, a different message is sent, something like "Zabbix server is not receiving data from several hosts"
I guess this could be done by creating a new trigger that detects more than 2 active events for "host unreachable", and then setting a dependency on the "host unreachable" trigger template, but I don't know how to get this information.
Can someone give me a hand here?
On the other hand, if someone can think of a better way of doing this, please let me know.
EDIT: I've been playing with an aggregate check to get the number of "Agent pings" in the last 3 minutes:
grpsum["Linux servers","agent.ping",count,180]
This value will drop dramatically if something like what happened this morning happens again.
In any case, I'm not sure if this is the best option.
Regards,
Josep
I've tried to setup an alert to get early advice if a host is down. To do so, I've used the Zabbix agent on {HOST.NAME} is unreachable for 5 minutes trigger, and set up an action to notify me in several steps.
This is working fine, but this morning, for some reason, the Zabbix server stopped receiving Zabbix agent pings, which caused a flood of alerts, since every host was marked as down. I had to restart the server and then clear the alerts and escalations tables.
So here's the thing: I want to change my message action so that if more that two hosts appear as unreachable no message is sent--or even better, a different message is sent, something like "Zabbix server is not receiving data from several hosts"
I guess this could be done by creating a new trigger that detects more than 2 active events for "host unreachable", and then setting a dependency on the "host unreachable" trigger template, but I don't know how to get this information.
Can someone give me a hand here?
On the other hand, if someone can think of a better way of doing this, please let me know.
EDIT: I've been playing with an aggregate check to get the number of "Agent pings" in the last 3 minutes:
grpsum["Linux servers","agent.ping",count,180]
This value will drop dramatically if something like what happened this morning happens again.
In any case, I'm not sure if this is the best option.
Regards,
Josep