Ad Widget

Collapse

Alerting and Suppressing of Alerts in a Flat Topology

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Cejay
    Junior Member
    • Mar 2024
    • 11

    #1

    Alerting and Suppressing of Alerts in a Flat Topology

    Apologies if this has been asked before, I am looking into setting up alerting with Zabbix 6.0 LTS.

    I will provide some background. Our deployment will be a very flat topology deployment. Where the vast majority of our the monitored endpoints, will be one or two services at a site, at many sites over a large geographical area.
    What I am trying to do is to determine how to do the following.

    If we have a small number of sites or hosts go down, we would like to obviously get those alerts either emailed to another system or the like.
    If we start to hit a threshold of say 15 or 20 hosts, we would like to supress the 20 seperate alerts and get a lets call it parent alert to say You have a significant outage.

    I have had a look at triggers and trigger dependencies, and while this seems to work well for devices at the one geographic location, where you have say 50 seperate devices, and the Parent router is offline. In that case you get one alert instead of 50.
    In or case there is no easy to think of common parent.

    So in short I am currently looking at :
    • Event Correlation
    • Services
    • Tags and Problem Tags
    • Trigger Dependencies
    To try and figure out how to do this.

    I have had a look at services and I will be investigating that further. But I would just like to ask how have people supressed alerts or catered for mass outages, when the topology of what you are monitoring is very flat.
  • Cejay
    Junior Member
    • Mar 2024
    • 11

    #2
    So I seem to have answered my own question in some regards. But I have determined Services might be the best way for me to go in regards to this. I do have a few questions though:

    I have the following questions if someone is able to help me.
    1. Problem Tags are inherited from the Item Tags built into the Template that the problem uses, and secondly the Host Tags that were on the Host at the time the Problem started. Through some of my testing of a service I determined my service wasn't working as I had preexisting problems I was using to try and tune my Service. If I add tags to the host after the Problem for that host has begun, the problem will not inherit the new host tags.
    2. Is there a way to "suppress individual host alerts" if a large "service incident is occurring" What I mean by this is I think we can obviously alert on 50% of the devices in a service being down. But we also don't want to deal with 50 Separate problems/alerts if we have a large scale outage, it would be useful to suppress them. Now I understand that can be done with trigger dependencies, but our issue is our topology is very flat, where we have a small amount of services across a large number of sites. So we need to know about individual hosts going down, but we also need to not be swamped by an alert storm
    Thanks

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4807

      #3
      Tags are added at the moment of event generation. Adding tags anywhere will not affect already triggered events.

      Comment

      • Cejay
        Junior Member
        • Mar 2024
        • 11

        #4
        Originally posted by cyber
        Tags are added at the moment of event generation. Adding tags anywhere will not affect already triggered events.
        Thanks for that, yes I ended up learning that, after spending a reasonable amount of time going through the alerting.

        Is anyone able to help with the below?
        1. Is there a way to "suppress individual host alerts" if a large "service incident is occurring" What I mean by this is I think we can obviously alert on 50% of the devices in a service being down. But we also don't want to deal with 50 Separate problems/alerts if we have a large scale outage, it would be useful to suppress them. Now I understand that can be done with trigger dependencies, but our issue is our topology is very flat, where we have a small amount of services across a large number of sites. So we need to know about individual hosts going down, but we also need to not be swamped by an alert storm


        thanks

        Comment

        • jacksmithh
          Junior Member
          • May 2024
          • 6

          #5
          Originally posted by Cejay
          Apologies if this has been asked before, I am looking into setting up alerting with Zabbix 6.0 LTS.

          I will provide some background. Our deployment will be a very flat topology deployment. Where the vast majority of our the monitored endpoints, will be one or two services at a site, at many sites over a large geographical area.
          What I am trying to do is to determine how to do the following.

          If we have a small number of sites or hosts go down, we would like to obviously get those alerts either emailed to another system or the like.
          If we start to hit a threshold of say 15 or 20 hosts, we would like to supress the 20 seperate alerts and get a lets call it parent alert to say You have a significant outage.

          I have had a look at triggers and trigger dependencies, and while this seems to work well for devices at the one geographic location, where you have say 50 seperate devices, and the Parent router is offline. In that case you get one alert instead of 50.
          In or case there is no easy to think of common parent.

          So in short I am currently looking at :
          • Event Correlation
          • Services
          • Tags and Problem Tags
          • Trigger Dependencies
          To try and figure out how to do this.

          I have had a look at services and I will be investigating that further. But I would just like to ask how have people supressed alerts or catered for mass outages, when the topology of what you are monitoring is very flat.
          In a flat topology deployment like yours, managing alerting efficiently is crucial to avoid drowning in notifications. Your approach of exploring event correlation, service-based monitoring, and trigger dependencies is spot-on. Leveraging tags and problem tags can add granularity to your alerting strategy. Remember to regularly review and refine your setup to ensure it's tailored to your evolving needs. Best of luck with your deployment!

          Comment

          Working...