Ad Widget

Collapse

Implementing Anti-Alert Storm in Zabbix for Switch and Access Point Downtime

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • oooMrFrank
    Junior Member
    • Mar 2025
    • 2

    #1

    Implementing Anti-Alert Storm in Zabbix for Switch and Access Point Downtime

    Good evening everyone, I would like to get some advice from you.

    I have multiple PoE switches connected to numerous Access Points. When a switch goes down, all the connected Access Points go down as well, and I get flooded with both down and up email notifications when they come back online.

    I initially thought of setting a dependency for the ICMP down trigger of the Access Points on the corresponding switch. However, with this method, once the switch is back up, I still get flooded with emails. This happens because an Access Point takes around 5 minutes to fully restart, while the switch takes about 2 minutes.

    I would like to implement an anti-alert storm mechanism as follows:
    1. Switch Down → Disable monitoring for associated APs in a specific host group
    2. Switch Up → Resume AP monitoring after 5 minutes

    I believe this can be implemented, but I’m unsure of the best method. I appreciate any suggestions and am open to alternative approaches. I could also leverage the Zabbix API if necessary.

    My Zabbix version is 7.0.

    Thank you!
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2


    However, that solution is for situations when you cannot monitor dependencies for one reason or another. In your situation, you absolutely can monitor the dependencies (the switches). You need to make your AP triggers (or at least a "base trigger") depend upon some base connectivity trigger for the switch that's between your zabbix server or proxy and the AP.

    Do as much as you can using trigger dependencies. Once you start setting up your dependencies, you'll identify other places where you should be using them.

    Comment

    • oooMrFrank
      Junior Member
      • Mar 2025
      • 2

      #3
      Originally posted by tim.mooney
      https://www.zabbix.com/forum/zabbix-...ing-python-api

      However, that solution is for situations when you cannot monitor dependencies for one reason or another. In your situation, you absolutely can monitor the dependencies (the switches). You need to make your AP triggers (or at least a "base trigger") depend upon some base connectivity trigger for the switch that's between your zabbix server or proxy and the AP.

      Do as much as you can using trigger dependencies. Once you start setting up your dependencies, you'll identify other places where you should be using them.
      Thank you, Tim, I really appreciate your response. I have thought deeply about my issue and I believe I have found a valid solution. I would love to get your feedback on it.

      Scenario:
      Switch A1 with 10 connected APs.
      • On Switch A1, I create an item that collects the device's uptime and a trigger that activates if the uptime is ≤ 6 minutes.
      • In the template for the 10 APs, I set dependencies on both the ICMP check of A1 and the uptime trigger of A1.

      This way:
      • If A1 goes down, I won’t receive alerts for the APs since they depend on A1’s ICMP check.
      • When A1 comes back online, the second condition is met only after 6 minutes, which is the time required for the APs to come back up.
      I have also created scripts that are triggered when the switch goes down through actions to disable all hosts in a host group via API. However, I think the first option I proposed is better. What do you think?

      Comment


      • tim.mooney
        tim.mooney commented
        Editing a comment
        That seems like a good way to start. Try it and adjust as necessary.

        My experience with network topology dependencies is that you want Zabbix to detect problems with the dependencies as soon as possible. That reduces (but does not totally eliminate) any potential timing issues between the leaf nodes and the dependencies. Checking the dependencies more frequently than the leaf nodes helps, but it often comes down to carefully thinking about all of the possible failure scenarios and timings and adjusting your trigger dependencies for them.
    Working...