I've discovered a problem with the operation of dependencies which degrades their usefulness immensely (in our environment anyway
. Take a scenario with 3 switches, A,B, and C, where switch A is connected to B, and B is connected to C. Assume Zabbix is connected to Switch A. Dependencies are created on the ping triggers so that trigger ping C is dependent on trigger ping B, which in turn is dependent on trigger ping A.
If hosts A,B, and C fail in consecutive order, such that when the dependency of C is checked, it notices that B has failed, and when the dependency of B is checked, it knows that A has failed, and notifications are only sent out for switch A.
However, if all three hosts fail simultaneously, one can run into a situation where notifications are sent out to all three hosts. This will happen if the 3 hosts fail at the exact moment the ping test is being done on host C, and then host B, and then host A. This happens because the state of the dependencies are not being re-evaluated on the fly, contrary to http://www.zabbix.com/forum/showthread.php?t=7991. I validated this by using tcpdump to see if parent hosts were being pinged after the child triggers fail.
At this point, this issue is the biggest problem from migrating our nms from nagios to zabbix. I have to be concerned with limiting notifications to blocking outages only, we have actually had our Verizon pagers disconnected at one point because of too much volume (incorrect parent hosts in nagios caused this).
Kudos to the Zabbix team for this great product, I hope this issue can be addressed (or someone can show my error) because I would really like to implement it..
. Take a scenario with 3 switches, A,B, and C, where switch A is connected to B, and B is connected to C. Assume Zabbix is connected to Switch A. Dependencies are created on the ping triggers so that trigger ping C is dependent on trigger ping B, which in turn is dependent on trigger ping A.If hosts A,B, and C fail in consecutive order, such that when the dependency of C is checked, it notices that B has failed, and when the dependency of B is checked, it knows that A has failed, and notifications are only sent out for switch A.
However, if all three hosts fail simultaneously, one can run into a situation where notifications are sent out to all three hosts. This will happen if the 3 hosts fail at the exact moment the ping test is being done on host C, and then host B, and then host A. This happens because the state of the dependencies are not being re-evaluated on the fly, contrary to http://www.zabbix.com/forum/showthread.php?t=7991. I validated this by using tcpdump to see if parent hosts were being pinged after the child triggers fail.
At this point, this issue is the biggest problem from migrating our nms from nagios to zabbix. I have to be concerned with limiting notifications to blocking outages only, we have actually had our Verizon pagers disconnected at one point because of too much volume (incorrect parent hosts in nagios caused this).
Kudos to the Zabbix team for this great product, I hope this issue can be addressed (or someone can show my error) because I would really like to implement it..
Comment