No announcement yet.

Parrent dependency check before alerting

  • Filter
  • Time
  • Show
Clear All
new posts

    Parrent dependency check before alerting

    Hi Forum,

    We have a 900+ host setup, and are configuring host-dependency monitoring.
    What we want to achieve is that if a switch is down, all hosts behind it should not alert - only one alert should be send telling that this switch is unreachable.

    We have now set it up, but when we loose connectivity to one site (we just had a maintenance break), we still get alerts from a lot of hosts (not all), eventhough our switches are unreachable aswell. My guess is that not all hosts are alerting is caused by Zabbix not knowing/checking if the switch is ok before alerting, because it hasn't checked this switch yet though it's normal check-cycle. In Nagios you could configure (or was it default) this so that it checks parrent dependencies before alerting if check failes for a specific host.

    I could setup that it needs x-number of failed child-checks for an alert to happen, but this will delay my alerts and I will have to run a lot of unnessesary checks. I'm really not interested in this as we will have 4-5 dependencies for some hosts !

    Is there an option in Zabbix, so I can enable this somehow ?
    (in the long run I would surgest making this the default...)

    Thanks in advance !
    Last edited by maymann; 27-02-2012, 10:01.

    I have the same problem and we are out of luck here, look:


      Same problem here. I can not find solution so far...


        Tried adjusting the polling frequency and trigger conditions?

        The assessment by the OP makes sense. If this is the case though, then simply increasing the polling frequency such that the core device has more than double the frequency of the other devices that depend on it, and make the trigger require 2 cycles to trip.

        • Core router A with a server B behind it
        • 2 failures required to initiate trigger
        • Core router A polled every 29s
        • Server B polled every 60s
        • Trigger for server B failure depends on trigger for core router A
        • Core router A fails which means both the poll to core router A and the server B are going to fail.
        • Even if Server B is polled and sees the server down immediately when the event takes place, there will be *at least* 60 seconds before it polls again and actually triggers.
        • In that 60 seconds, the poll to core router A should have polled twice (mathematically at least, if 'something bad' happens, it might not work out that way, but this is why it is set to 29s instead of 30s).
        • Because the trigger for core router A has seen 2 failures, it trips.
        • By the time the 2nd poll for server B comes around, the value for the dependency (core router A) is marked as PROBLEM and so the trigger for B is suppressed.

        What I would like to see is when a dependency exists, rather than just checking against the last value of the dependency item (i.e. currently stored value), a dependency check actually initiating an unscheduled poll of the dependency and uses *that* value.


          Confirmed solved for me (work-around)

          As part of the testing I am doing for whether we will switch our business monitoring system to Zabbix, I had to demonstrate in a working Zabbix environment how to use triggers so that they reliably behave as we expect.

          It took me quite a few tests to come up with a working theory on how to calculate polling and trigger values to always ensure the first router always triggers first, but I have come up with something that works for us. I wrote it up as a technical paper if anyone wants to use it / adapt it for their own organisation's network.