Ad Widget

Collapse

Is it possible to combine alerts?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vic
    Member
    • Jul 2013
    • 58

    #1

    Is it possible to combine alerts?

    Other than using a more reliable zabbix server network, what can we do to stop getting hundreds of false alerts when the zabbix server network is having problems?

    Any network issue that causes zabbix to think the monitored servers are down causes this. If zabbix can't send out emails because of the network issue then the email server keeps trying and we are flooded with emails once it starts working again. Also we get emails when zabbix thinks the servers are ok so then we get flooded with the "ok" emails. We set up a macro that monitors multiple websites and suppresses alerts when it can't reach the websites. That has reduced but not eliminated the problem. There are still scenarios where the false positives occur even though the monitored websites are still accessible.

    We also get SMS messages when server down problems are detected, and also when servers come back up. So we get hundreds of false alert SMS messages

    One way around this would be to delay major (server down) messages by about a minute so all server down messages in that time can be combined. If it's not a feature or easy to implement with Zabbix macros it probably should be.
    Last edited by vic; 01-07-2016, 17:32.
  • Linwood
    Senior Member
    • Dec 2013
    • 398

    #2
    It's a good question. What I've done is build in concepts of edge devices (marking them with a macro, basically on each firewall), and then putting dependencies for each device behind that edge device on that device itself's reachability.

    In other words, if there are 30 devices in a remote subnet, and firewall X hosts the (only) connection to them, then for the 29 other devices, have a dependency on availability of X. Then if X goes down, you only get the one notice from X, not all 30.

    This is not completely straightforward, as you then need to make sure ping polling of X is faster than the other 29 devices (or more precisely, perhaps for two failure periods for those 29 and 1 for X).

    This is related to the IT Services concept, though I have not explored getting that set up; it may be a better alternative.

    Combine that with ensuring any trigger that relates to connectivity (for example, ping loss) is dependent on availability also, so that all other alerts are also suppressed.

    Comment

    • vic
      Member
      • Jul 2013
      • 58

      #3
      Originally posted by Linwood
      It's a good question. What I've done is build in concepts of edge devices (marking them with a macro, basically on each firewall), and then putting dependencies for each device behind that edge device on that device itself's reachability.

      In other words, if there are 30 devices in a remote subnet, and firewall X hosts the (only) connection to them, then for the 29 other devices, have a dependency on availability of X. Then if X goes down, you only get the one notice from X, not all 30.

      This is not completely straightforward, as you then need to make sure ping polling of X is faster than the other 29 devices (or more precisely, perhaps for two failure periods for those 29 and 1 for X).

      This is related to the IT Services concept, though I have not explored getting that set up; it may be a better alternative.

      Combine that with ensuring any trigger that relates to connectivity (for example, ping loss) is dependent on availability also, so that all other alerts are also suppressed.
      My scenario does not have all devices going through one monitored device, so I can't do what you are doing. I don't know what the IT Services concept is you are referring to but will look into it.

      I already do the trigger relating to connectivity you mentioned. I need to add more intelligence somehow. I could probably build a macro that says if x number of devices are all reporting loss of ping in the past x minutes, then suppress alerts. I find the macro system in zabbix hard to work with though. I don't have to do it very often and I find it takes awhile to figure things out. It took me quite awhile to figure out the macro to monitor multiple websites and make intelligent connectivity decisions based on that.
      Last edited by vic; 01-07-2016, 19:01.

      Comment

      • ARA
        Junior Member
        • Aug 2011
        • 15

        #4
        See "Alerts Escalation" section (https://www.zabbix.com/documentation...on/escalations), it could be tricky to configure everything properly with multiple escalations, but for a basic case (just mute alerts for N seconds and send alerts only if problem was not resolved) it's easy (see example 2).

        Comment

        • registration_is_lame
          Senior Member
          • Nov 2007
          • 148

          #5
          Originally posted by vic
          It took me quite awhile to figure out the macro to monitor multiple websites and make intelligent connectivity decisions based on that.
          And I'm not sure what he is means by that. Am I missing some zabbix features that I'm not aware of? All i use macro's is to define variables and this guy says he can monitor multiple websites and make intelligent connectivity decisions based on macros..! how?
          Last edited by registration_is_lame; 04-12-2017, 14:05.

          Comment

          • tgrissom
            Junior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Nov 2017
            • 7

            #6
            Dependant Triggers

            In 3.0 at least you can create trigger dependancies such that if one trigger is in 'problem' state then all of the dependant triggers are paused until such time as the first trigger clears.

            e.g.

            Trigger_a: Host A is unreachable.
            Trigger_b: web server not responding
            Trigger_c: No data received from DB Size in 5 minutes.

            Configure triggers b and c to be dependant on a.

            Then when host A goes down trigger A goes into 'Problem' state.
            The web server does not respond, obviously, because the host is down.
            And we have not heard about the DB item for the same reason. Both these triggers would normaly fire but because they are not depenbdant on trigger a, the are effectively suspended until trigger_a clears.

            Further tricks can be implemented so that when the host comes up, trigger a does not clear until the host has been available for at least 5 minutes. By then the web server and DB would have had time to recover.

            Comment

            • sammsul
              Junior Member
              • Dec 2017
              • 2

              #7
              Is it possible to combine alerts?

              It's a good question. What I've done is build in concepts of edge devices (marking them with a macro, basically on each firewall), and then putting dependencies for each device behind that edge device on that device itself's reachability.

              Is it possible to create alerts combining events and the result of a ...
              Hi, I'm starting to use the tool and would like to know if it's possible to create rules through levels. Eg, when a particular alert is triggered and a specific event occurs, you can filter and com...



              for info - www.google.com
              Search Is it possible to combine alerts?

              Comment

              • MarkD
                Junior Member
                • Jun 2018
                • 17

                #8
                Originally posted by tgrissom
                Further tricks can be implemented so that when the host comes up, trigger a does not clear until the host has been available for at least 5 minutes. By then the web server and DB would have had time to recover.
                This is interesting. How can I configure the trigger to not auto clear for 5 minutes?

                Thank you!

                Comment

                Working...