Hello.
This post is pretty much the same as Sjeik's "Prevent Zabbix from hitting triggers after a restart of zabbix_server daemon" six months ago.
It would be good if there was "grace period" after Zabbix server restart. During grace period Zabbix would ignore all actions. The grace period could be defined for example in the zabbix_server.conf as "GracePeriod=300" to set it to five minutes.
With a grace period items with nodata() triggers would have time to receive data instead of Zabbix going mad about not having any data of the items. Yesterday evening our Zabbix installation went crazy and sent over 400 sms messages after maintenance period. That's because we have nodata() triggers linked to agent.ping items. When the Zabbix server (not any agent) is down for over three minutes, all the triggers automatically change to true when Zabbix server is started again. Short grace period after Zabbix server start would give Zabbix time to receive data for the items linked to nodata() triggers.
Alternative solution would be to add condition about Zabbix server's state to the nodata() triggers. That would mean adding another level of complexity to the triggers and items. It would require separate item monitoring Zabbix's state. It would also probably affect the performance of the whole trigger and action mechanism. This kind of solution doesn't seem good.
For now we'll probably just work-around this "maintenance madness" problem by disabling the sms send script while bringing Zabbix back to action after maintenance. That way Zabbix will still send hundreds of false email alerts, but at least we don't have to set the on duty phone silent
This post is pretty much the same as Sjeik's "Prevent Zabbix from hitting triggers after a restart of zabbix_server daemon" six months ago.
It would be good if there was "grace period" after Zabbix server restart. During grace period Zabbix would ignore all actions. The grace period could be defined for example in the zabbix_server.conf as "GracePeriod=300" to set it to five minutes.
With a grace period items with nodata() triggers would have time to receive data instead of Zabbix going mad about not having any data of the items. Yesterday evening our Zabbix installation went crazy and sent over 400 sms messages after maintenance period. That's because we have nodata() triggers linked to agent.ping items. When the Zabbix server (not any agent) is down for over three minutes, all the triggers automatically change to true when Zabbix server is started again. Short grace period after Zabbix server start would give Zabbix time to receive data for the items linked to nodata() triggers.
Alternative solution would be to add condition about Zabbix server's state to the nodata() triggers. That would mean adding another level of complexity to the triggers and items. It would require separate item monitoring Zabbix's state. It would also probably affect the performance of the whole trigger and action mechanism. This kind of solution doesn't seem good.
For now we'll probably just work-around this "maintenance madness" problem by disabling the sms send script while bringing Zabbix back to action after maintenance. That way Zabbix will still send hundreds of false email alerts, but at least we don't have to set the on duty phone silent

Comment