ZABBIX Forums  
  #1  
Old 27-02-2012, 08:42
maymann maymann is offline
Junior Member
 
Join Date: Nov 2011
Posts: 8
Default Parrent dependency check before alerting

Hi Forum,

We have a 900+ host setup, and are configuring host-dependency monitoring.
What we want to achieve is that if a switch is down, all hosts behind it should not alert - only one alert should be send telling that this switch is unreachable.

We have now set it up, but when we loose connectivity to one site (we just had a maintenance break), we still get alerts from a lot of hosts (not all), eventhough our switches are unreachable aswell. My guess is that not all hosts are alerting is caused by Zabbix not knowing/checking if the switch is ok before alerting, because it hasn't checked this switch yet though it's normal check-cycle. In Nagios you could configure (or was it default) this so that it checks parrent dependencies before alerting if check failes for a specific host.

I could setup that it needs x-number of failed child-checks for an alert to happen, but this will delay my alerts and I will have to run a lot of unnessesary checks. I'm really not interested in this as we will have 4-5 dependencies for some hosts !

Is there an option in Zabbix, so I can enable this somehow ?
(in the long run I would surgest making this the default...)


Thanks in advance !
~maymann

Last edited by maymann; 27-02-2012 at 09:01.
Reply With Quote
  #2  
Old 28-02-2012, 08:26
devnull devnull is offline
Junior Member
 
Join Date: Feb 2012
Posts: 6
Default

I have the same problem and we are out of luck here, look:

https://support.zabbix.com/browse/ZBX-3163
Reply With Quote
  #3  
Old 13-03-2012, 08:35
sasskinn12 sasskinn12 is offline
Junior Member
 
Join Date: Mar 2012
Posts: 5
Default

Same problem here. I can not find solution so far...
Reply With Quote
  #4  
Old 24-08-2012, 11:24
sirtech sirtech is offline
Junior Member
 
Join Date: Aug 2012
Posts: 23
Default Tried adjusting the polling frequency and trigger conditions?

The assessment by the OP makes sense. If this is the case though, then simply increasing the polling frequency such that the core device has more than double the frequency of the other devices that depend on it, and make the trigger require 2 cycles to trip.

Example:
  • Core router A with a server B behind it
  • 2 failures required to initiate trigger
  • Core router A polled every 29s
  • Server B polled every 60s
  • Trigger for server B failure depends on trigger for core router A
  • Core router A fails which means both the poll to core router A and the server B are going to fail.
  • Even if Server B is polled and sees the server down immediately when the event takes place, there will be *at least* 60 seconds before it polls again and actually triggers.
  • In that 60 seconds, the poll to core router A should have polled twice (mathematically at least, if 'something bad' happens, it might not work out that way, but this is why it is set to 29s instead of 30s).
  • Because the trigger for core router A has seen 2 failures, it trips.
  • By the time the 2nd poll for server B comes around, the value for the dependency (core router A) is marked as PROBLEM and so the trigger for B is suppressed.

What I would like to see is when a dependency exists, rather than just checking against the last value of the dependency item (i.e. currently stored value), a dependency check actually initiating an unscheduled poll of the dependency and uses *that* value.
Reply With Quote
  #5  
Old 30-08-2012, 01:14
sirtech sirtech is offline
Junior Member
 
Join Date: Aug 2012
Posts: 23
Default Confirmed solved for me (work-around)

As part of the testing I am doing for whether we will switch our business monitoring system to Zabbix, I had to demonstrate in a working Zabbix environment how to use triggers so that they reliably behave as we expect.

It took me quite a few tests to come up with a working theory on how to calculate polling and trigger values to always ensure the first router always triggers first, but I have come up with something that works for us. I wrote it up as a technical paper if anyone wants to use it / adapt it for their own organisation's network.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:48.