Ad Widget

Collapse

Trigger Dependencies don't work as expected

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bturnbough
    Member
    • Mar 2011
    • 70

    #1

    Trigger Dependencies don't work as expected

    Hi All,

    I'm hoping someone can offer some insight into this.

    I have a site with 3 switches, and a wan router/firewall.

    The three switches each have a snmp nodata check and also a ping check. The snmp nodata check depends on the ping check.

    The firewall also has a ping check. The switches ping check depends on the firewall ping check.

    So, what you have is this:

    snmp nodata check -----> switch ping -----> firewall ping.

    No data check:
    {SO1TECHS2950-2:uptime.nodata(600)}=1
    Depends on:
    SO1TECHS2950-2 : {HOST.NAME} is not responding to pings.

    Switch ping:
    {SO1TECHS2950-2:icmpping[].max(180)}<1
    Depends on :
    SOFW01 : {HOST.NAME} is not responding to pings.

    Firewall ping:
    {SOFW01:icmpping[].max(180)}<1
    No dependencies.

    When an outage occurs I receive these messages (one for the switch ping failure, and one for the snmp no data) even though the dependencies are properly set.

    It appears that Zabbix isn't handling dependencies properly. They only work if the timing of the checks is perfect.

    Am I missing something?
  • bturnbough
    Member
    • Mar 2011
    • 70

    #2
    **bump**

    Is anyone also experiencing this?

    Can someone please post examples of their triggers and their dependencies?

    I have a feeling I may be doing something wrong, but don't know what. I'd appreciate the help.

    Brad

    Comment

    • bturnbough
      Member
      • Mar 2011
      • 70

      #3
      **bump**

      Anyone? Anyone?

      Please?

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        Originally posted by bturnbough
        Anyone? Anyone?

        Please?
        Dependencies are working like a charm.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • bturnbough
          Member
          • Mar 2011
          • 70

          #5
          Another example

          Came in this morning and checked my email. Here is another example:

          Severity: High
          Name: Network-Ping_3min-3Fails: {HOST.NAME} is not responding to pings.
          Depends on : NOVPN01 : {HOST.NAME} is not responding to pings.
          Expression: {NO1TECHS2950-1:icmpping[].max(180)}<1
          Status: Enabled

          Severity: High
          Name: Network-Ping_3min-3Fails: {HOST.NAME} is not responding to pings.
          Depends on : NOVPN01-EXT : {HOST.NAME} is not responding to pings.
          Expression: {NOVPN01:icmpping[].max(180)}<1
          Status: Enabled

          Severity: High
          Name: Network-Ping_3min-3Fails: {HOST.NAME} is not responding to pings.
          Expression: {NOVPN01-EXT:icmpping[].max(180)}<1
          Status: Enabled

          PROBLEM Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 04:19:24
          PROBLEM Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 04:19:39
          PROBLEM Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 04:19:51
          RECOVERY Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 06:10:51
          RECOVERY Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 06:11:39
          RECOVERY Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 06:12:24
          PROBLEM Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 06:16:24
          PROBLEM Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 06:16:39
          PROBLEM Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 06:16:51
          RECOVERY Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 06:21:51
          RECOVERY Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 06:22:39
          RECOVERY Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 06:23:24
          PROBLEM Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 07:03:24
          PROBLEM Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 07:03:39
          PROBLEM Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 07:03:51
          RECOVERY Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 07:04:51
          RECOVERY Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 07:05:39
          RECOVERY Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 07:06:24
          PROBLEM Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 07:17:24
          PROBLEM Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 07:17:39
          RECOVERY Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 07:18:39
          RECOVERY Host: NO1TECHS2950-1 Trigger: NO1TECHS2950-1 is not responding to pings. Event time: 07:19:24
          PROBLEM Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 07:27:39
          PROBLEM Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 07:27:51
          RECOVERY Host: NOVPN01-EXT Trigger: NOVPN01-EXT is not responding to pings. Event time: 07:30:51
          RECOVERY Host: NOVPN01 Trigger: NOVPN01 is not responding to pings. Event time: 07:31:39



          They are NOT working, as you suggest. Please advise.

          Comment

          • MaXoo49
            Junior Member
            • Apr 2015
            • 4

            #6
            Hi,
            I don't really understand your whole problem but I can try to answer.

            The period check of your item is good when it is at less 2 times lower than the period defined on your trigger.

            For example, to be great :
            The item : SO1TECHS2950-2:uptime can have a check period of 250s
            and icmpping can have a check period of 80s

            But I don't understand why you have a notification for snmp nodata check if the switch ping trigger is activated...
            Are you sure that you haven't reversed your dependencies configuration like that :
            snmp nodata check <----- switch ping <----- firewall ping ?

            PS : Sorry for my approximative English language

            Comment

            • bturnbough
              Member
              • Mar 2011
              • 70

              #7
              Additional Information

              Hi MaXoo49,

              Thanks for your response.

              The item : SO1TECHS2950-2:uptime can have a check period of 250s
              and icmpping can have a check period of 80s
              So are you saying this?

              Device A trigger depends on Device B trigger which depends on Device C trigger.
              i.e.
              Switch ping relies on
              firewall ping relies on
              wan router ping

              ping for switch needs to be 520s
              ping for firewall needs to be 250s
              ping for wan router needs to be 80s ????

              Here is a copy paste of the triggers:


              Name: Network-Switch-Cisco-Uptime: {HOST.NAME} -- AWNMS01 -- No Host SNMP UPTIME data has been received in 10 or more minutes.
              Depends on: SO1AETDS2950-1 : {HOST.NAME} is not responding to pings.
              Expression: {SO1AETDS2950-1:uptime.nodata(600)}=1
              Status: Enabled

              Name: Network-Ping_3min-3Fails: {HOST.NAME} is not responding to pings.
              Depends on: SOFW01 : {HOST.NAME} is not responding to pings.
              Expression: {SO1AETDS2950-1:icmpping[].max(180)}<1
              Status: Enabled

              Name: Network-Ping_3min-3Fails: {HOST.NAME} is not responding to pings.
              Depends on: NOTHING
              Expression: {SOFW01:icmpping[].max(180)}<1
              Status: Enabled

              The switch uptime check is every 30s
              The switch ping check is every 60s
              The firewall ping check is every 60s

              Any further ideas?

              Comment

              • MaXoo49
                Junior Member
                • Apr 2015
                • 4

                #8
                Okay, your configuration seems to me great for the dependances.

                I don't understand why you have one notification for the switch ping failure, and one for the snmp no data.

                I can understand the case you have one notification for the switch ping failure, and one for the firewall ping failure because the time period of the switch ping trigger isn't the period of firewall trigger + firewall ping check.

                Comment

                • bturnbough
                  Member
                  • Mar 2011
                  • 70

                  #9
                  I don't understand either. I believe this to be a major flaw in the way Zabbix handles dependencies.

                  The timing of events plays a major factor of the activation of triggers. The timing of everything in the dependency tree needs to be exactly perfect for the admin to only receive one notification "the wan router is down", instead of three notifications.

                  I'm inclined to file a bug, but I'm afraid it'll be closed with "unable to reproduce" or "wont fix" like many of the other bug reports that I've filed.

                  I've become more and more pessimistic about Zabbix SIA. I've seen less and less posts from their employees regarding helping their community out. One can assume that's because they're trying to force folks to buy their support.

                  Having to change the frequency of all of the items so that they're at least 2x the length would be absolutely crazy. I don't want a ping check to occur every 480 seconds. I want it to be checked every 30 seconds.

                  Comment

                  • bturnbough
                    Member
                    • Mar 2011
                    • 70

                    #10
                    **bump**

                    **BUMP**

                    There has to be someone out there that knows Trigger dependencies better than I.

                    Chime in, folks.

                    Comment

                    • pc99096
                      Senior Member
                      • Oct 2011
                      • 193

                      #11
                      correct timing and thinking of overlapping of multiple triggers is absolutely critical when talking about trigger dependencies.


                      try using multiple expressions in a single trigger:
                      {SO1AETDS2950-1:uptime.nodata(600)}=1 and {SO1AETDS2950-1:icmpping[].max(180)}<1


                      try playing with different times for different triggers and/or different intervals for the items.

                      if you have 60s checks, try changing max(180)<1 to sum(#3)<1 (although this probably won't help)

                      Comment

                      • bturnbough
                        Member
                        • Mar 2011
                        • 70

                        #12
                        Originally posted by pc99096
                        correct timing and thinking of overlapping of multiple triggers is absolutely critical when talking about trigger dependencies.


                        try using multiple expressions in a single trigger:
                        {SO1AETDS2950-1:uptime.nodata(600)}=1 and {SO1AETDS2950-1:icmpping[].max(180)}<1


                        try playing with different times for different triggers and/or different intervals for the items.

                        if you have 60s checks, try changing max(180)<1 to sum(#3)<1 (although this probably won't help)
                        Ok, so I had a chance to play around with this suggestion. It looks like it achieves the desired result of reducing the erroneous alerts, HOWEVER, it creates NEW problems.

                        Alerts are now displayed on the dashboard for the wrong host groups, because of the way the conditionals are configured.

                        Object A Host group memberships: Group A and Group B
                        Object B Host group Memberships: Group C and Group D

                        When an alert is displayed for object B, I now see the alert in group A and B and also in group C and D on the dashboard. This is an issue because some of my actions are based on the host group, so essentially, the wrong admins would be notified, in additon to the correct ones.

                        I was hopeful that that would work, but it doesn't look like it.

                        Anything else?

                        Comment

                        • bturnbough
                          Member
                          • Mar 2011
                          • 70

                          #13
                          Specific Example

                          Here is the Trigger that causes the alert to show up in multiple other host groups. I somewhat understand why this is happening, but I thought the alerts would only display in the hostgroup(s) that a host was a member of?

                          {PESW00:icmpping[].max(180)}<1 & {PEFW00:icmpping[].min(180)}>0

                          Explanation:

                          {PESW00:icmpping[].max(180)}<1 <--------- PESW00 is a member of hostgroups A and B

                          & {PEFW00:icmpping[].min(180)}>0 <------------PEFW01 is a member of hostgroups C and D

                          Comment

                          • bturnbough
                            Member
                            • Mar 2011
                            • 70

                            #14
                            So what does everyone else do?

                            Can someone please provide me with an actual way other people are doing it?

                            I understand the timing and overlapping of item checks, but that does not scale AT ALL.

                            WAN router
                            ping
                            ssh check
                            uptime

                            Firewall
                            Ping
                            SSH check
                            uptime

                            Switch
                            ping
                            ssh check
                            uptime

                            How would you configure the above checks so that I only get minimal notifications when the WAN circuit is down?

                            Comment

                            • dirckcopeland
                              Member
                              • Oct 2013
                              • 50

                              #15
                              Trigger Dependencies don't work as expected

                              bturnbough,
                              I'm just throwing this out but if they are all dependent on each other and if all conditions have to be met before you want an alert to be sent out then have you tried combining them all into one trigger and expression using the AND operator like:

                              {SO1AETDS2950-1:uptime.nodata(600)}=1&{SO1AETDS2950-1:icmpping[].max(180)}<1&{SOFW01:icmpping[].max(180)}<1

                              Comment

                              Working...