Ad Widget

Collapse

Recommendation for handling dependent triggers and alerts?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • surfrock66
    Member
    • Jul 2018
    • 30

    #1

    Recommendation for handling dependent triggers and alerts?

    We have a single Zabbix server at our central site, running 3.4.12 on Ubuntu 18.04. I have a central office and 5 remote offices; each of my remote offices has a router, then behind that a primary switch, then behind that 2 phone systems. I have set up dependencies on my ICMP Ping triggers so, in theory, the phone system doesn't alert when the router goes down at any given site. For example, for the high Ping trigger (at 150ms threshold over a 5 minute average) for the phone system is dependent on the switch AND the router, then the switch is dependent on the router.

    What usually happens though is I get alerts for a problem on a dependent device, like the phone system. Then, a few seconds later I get the problem alert for the router. Immediately after that, I get a resolved notification for the dependent device, the phone system. 3 e-mails thus far minimum. Let's say the high-ping is during a file transfer lasting 10 minutes. At the end of the 10 minutes I get the resolved alert from the router, then immediately I get the alerts for all the dependent devices...phone systems, switch, etc. They resolve quickly.

    What appears to be happening is that there is a delay in the trigger activation for the dependent devices, so their 5-minute average is above the threshold just after their blocking dependency resolves. This isn't a bug, it seems to be doing exactly what the code defines.

    Because the period is defined in the template's trigger definition, you can't easily override the time period with a macro (and I'm not sure that you should). I wish there was a dependency cooldown system for alerts; a dependent alert can't fire for x time after its dependency resolves.

    How do other people solve this problem?
  • ninjatill
    Junior Member
    • Jan 2021
    • 3

    #2
    Old post, but I'm searching for exactly what the OP is recommending. It would be beneficial to have a dependency cooldown time period.

    Classic example is a branch with some hosts behind their router. The branch internet connection goes down and because I have dependencies setup between branch hosts and branch firewall, I only get one email stating the branch is down... exactly as expected. But, when the branch connection comes back up, I get an email stating the firewall came back up and simultaneously I get several emails stating each branch device (that was dependent on the firewall trigger) is down. And then within seconds, the branch hosts resolve so I get several more emails stating the branch hosts are back up. If there was just a 60 second delay for the branch hosts... I would just get the 2 notification for the branch router and no BS notifications for the behind-the-router devices.

    Comment

    • Isaackirk
      Junior Member
      • Sep 2023
      • 3

      #3
      You can set hysteresis on your triggers. Hysteresis introduces a delay before a trigger goes from a problem to a resolved state, which can prevent rapid oscillation of alerts when values hover near the threshold.

      Comment

      Working...