Ad Widget

Collapse

Is there a way to stop alerts flapping? 500 emails in 2 hours from 15 servers.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • noxis
    Senior Member
    • Aug 2007
    • 145

    #1

    Is there a way to stop alerts flapping? 500 emails in 2 hours from 15 servers.

    With the current features Zabbix offers is there a way (using the logic built into the system) to do some sort of flap detection? I am thinking I might be able to build something with the dependency framework offered. I got bombarded with around 500 emails today when the platform I am monitoring with my test Zabbix environment had some issues (lots of high load, slow http response etc etc...) An email was popped off everytime a service on a server flapped. Bouncing in and out of high load state etc..

    Thanks for any help
  • michaeltje
    Member
    • Aug 2007
    • 44

    #2
    Originally posted by noxis
    With the current features Zabbix offers is there a way (using the logic built into the system) to do some sort of flap detection? I am thinking I might be able to build something with the dependency framework offered. I got bombarded with around 500 emails today when the platform I am monitoring with my test Zabbix environment had some issues (lots of high load, slow http response etc etc...) An email was popped off everytime a service on a server flapped. Bouncing in and out of high load state etc..

    Thanks for any help
    Use the dependency's

    Comment

    • noxis
      Senior Member
      • Aug 2007
      • 145

      #3
      Originally posted by michaeltje
      Use the dependency's
      As I understand it tho, say you have just one server. Its a little busy, its popping up to load 4 - 5.5 constantly over the course of a few hours. Every single time it hits 5 it will send an email, each time it drops below 5 it will send a recovery.

      Is there a way to say "its hit this trigger 5 times in 20min stop sending" and have it send an email saying this fact? With a recovery back to normal after its stopped flapping for 20min.

      I think I am just explaining this badly Or I am not understanding the extend of what you can do with dependency's.

      Comment

      • michaeltje
        Member
        • Aug 2007
        • 44

        #4
        Dependency's are levels

        for example:


        Server
        ------
        Switch
        ------
        Router

        If the Router goes down, the switch and server will lose connection.

        if the dependency's are set correctly it wont message you about the server and switch being down but only about the router.

        What u mean can be set with the trigger itself.
        Last edited by michaeltje; 21-08-2007, 23:15.

        Comment

        • remi
          Junior Member
          • Sep 2006
          • 11

          #5
          Originally posted by noxis
          Is there a way to say "its hit this trigger 5 times in 20min stop sending" and have it send an email saying this fact? With a recovery back to normal after its stopped flapping for 20min.
          What you can do is make the trigger go on not just every time it hits a load of 5, but say when it is >5 for more than 5 minutes. Or when it's >5 during the last 5 checks.

          Examples:
          system.cpu.load[all,avg1].min(300)>5
          system.cpu.load[all,avg1].last(5)>5

          You can also use avg5 which is already an everage of the last 5 minutes.

          This way you will only be notified when the load is high for a longer time, and not everytime it has a small peak.

          Comment

          Working...