Ad Widget

Collapse

Dependency Woes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • markyrules
    Junior Member
    • Sep 2008
    • 6

    #1

    Dependency Woes

    I've discovered a problem with the operation of dependencies which degrades their usefulness immensely (in our environment anyway . Take a scenario with 3 switches, A,B, and C, where switch A is connected to B, and B is connected to C. Assume Zabbix is connected to Switch A. Dependencies are created on the ping triggers so that trigger ping C is dependent on trigger ping B, which in turn is dependent on trigger ping A.

    If hosts A,B, and C fail in consecutive order, such that when the dependency of C is checked, it notices that B has failed, and when the dependency of B is checked, it knows that A has failed, and notifications are only sent out for switch A.

    However, if all three hosts fail simultaneously, one can run into a situation where notifications are sent out to all three hosts. This will happen if the 3 hosts fail at the exact moment the ping test is being done on host C, and then host B, and then host A. This happens because the state of the dependencies are not being re-evaluated on the fly, contrary to http://www.zabbix.com/forum/showthread.php?t=7991. I validated this by using tcpdump to see if parent hosts were being pinged after the child triggers fail.

    At this point, this issue is the biggest problem from migrating our nms from nagios to zabbix. I have to be concerned with limiting notifications to blocking outages only, we have actually had our Verizon pagers disconnected at one point because of too much volume (incorrect parent hosts in nagios caused this).

    Kudos to the Zabbix team for this great product, I hope this issue can be addressed (or someone can show my error) because I would really like to implement it..
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    You may configure ZABBIX so that it would check availability of A more frequently than B, and B more frequently than C. This would greatly decrease possibility of the described scenario.

    Is it a reasonable solution?
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • markyrules
      Junior Member
      • Sep 2008
      • 6

      #3
      I had considered this, but it would be an effort considering the number of pieces of network equipment alone (650), which might result in other scalability issues as well. I can understand the argument behind not doing the checks on demand, if you didn't want to alter the consistency of results in the database..

      Even if it did reduce the number of notifications it makes maintenance of the configuration more complex because you are deviating from the templates.. Also, since there is currently no true flap detection we could incur a serious number of pages if we had a link flapping in the core of our network.

      Thanks for the reply!

      -Mark

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        I am not sure what you mean by the "true flap detection". I believe that flexibility provided by triggers can be used for anything including flap detection.

        As for the original question, I just do not see how this can be implemented especially taking into account that this has to be distributed monitoring/proxy compatible.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • markyrules
          Junior Member
          • Sep 2008
          • 6

          #5
          Zabbix's incredibly flexible trigger system is one of the reasons I want to migrate to it from Nagios. That being said I agree there is some way to write a trigger in such a way as to have "flap detection" (which I agree is subjective). What I meant by my comment was that there is no "checkbox" to have flap detection enabled for all triggers, unlike Nagios... The authors of Nagios have a pretty slick algorithm for detecting flapping and frankly it would be a lot of work for me to put the functionality in triggers...

          As for the dependency issue I'm sure there are many complications which makes my suggestion extremely difficult... But I appreciate the reply. I wish I had more time to help with that because I think it's an interesting problem to solve.

          -Mark

          Comment

          Working...