Ad Widget

Collapse

Many recovery notifications for a single problem notification

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zordio
    Junior Member
    • Mar 2009
    • 15

    #1

    Many recovery notifications for a single problem notification

    My Zabbix 1.6.4 setup is still fairly simple. About all I have is that each priority has its own action, mainly so I can rename a couple of the priorities so they match syslog priorities.

    Yesterday, I got a Warning that the CPU load on a machine was a little high. And then I got 70 recovery messages. A few more times after that, I got a warning, each with only a single recovery message.

    What could be causing these excessive recovery messages, and how do I get rid of them? It doesn't do very well for the higher priorities which also get sent to my phone.
  • Calimero
    Senior Member
    • Nov 2006
    • 481

    #2
    What item and what trigger (copy/paste full expression) cause the "flood" ?

    Comment

    • zordio
      Junior Member
      • Mar 2009
      • 15

      #3
      This produced 70 recovery messages once, but then acted normally.

      Item:
      system.cpu.load[,avg1]

      Trigger:
      {Template_FreeBSD:system.cpu.load[,avg1].last(0)}>2 * {Template_FreeBSD:system.cpu.num.last(0)}
      depends on: Processor load is too high on Template_FreeBSD (triggers on 4x num cpus)
      severity: Warning


      I also had trouble with the following giving 40 recovery messages for a single problem message:

      Item:
      icmpping

      Trigger:
      {Template_Cust_Internet:icmpping.max(60)}=0


      severity: warning

      Comment

      • Calimero
        Senior Member
        • Nov 2006
        • 481

        #4
        For a single event on a single host ?

        What if you go on the Monitoring > Events screen and the click on the event that cause all the notifications do you have 40 or 70 rows in the "Actions" table ?

        Comment

        • zordio
          Junior Member
          • Mar 2009
          • 15

          #5
          Yes, this happens for a single event on a single host.

          For the events in question, for the problem, there is one email to each recipient under Message Actions. For the recovery, there are a lot of emails. I didn't count them on that screen, but it was enough that it took a couple of seconds to send them all.

          Comment

          • floriang
            Junior Member
            • Jan 2009
            • 4

            #6
            I have also noticed the problem with different items/triggers in 1.6.4.

            Does anyone have an idea how to debug this further?

            Comment

            • Calimero
              Senior Member
              • Nov 2006
              • 481

              #7
              Originally posted by zordio
              Yes, this happens for a single event on a single host.

              For the events in question, for the problem, there is one email to each recipient under Message Actions. For the recovery, there are a lot of emails. I didn't count them on that screen, but it was enough that it took a couple of seconds to send them all.
              But do you have a single email per recipient or many emails per recipient ?

              Comment

              • zordio
                Junior Member
                • Mar 2009
                • 15

                #8
                I'm not sure what was unclear about that. I said that for each recipient, there are many emails. If there weren't, I wouldn't have posted here.

                Is it possible to have one recipient for the problem notification, and 70 for the recovery?

                Comment

                • richlv
                  Senior Member
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2005
                  • 3112

                  #9
                  please show action configuration, especially conditions and operations. combinations of conditions & escalation are known to cause similar effects.
                  Zabbix 3.0 Network Monitoring book

                  Comment

                  • zordio
                    Junior Member
                    • Mar 2009
                    • 15

                    #10
                    Lately, it has only been for warnings, so I will post that one.

                    Action:
                    Name: warnings to sysadmins
                    Event source: Triggers
                    Enable escalations: no
                    Default subject: Warning: {HOSTNAME} {TRIGGER.NAME}
                    Default message: Warning: {HOSTNAME} {TRIGGER.NAME}
                    {TRIGGER.KEY}: {{HOSTNAME}:{TRIGGER.KEY}.last(0)}
                    Recovery message: yes
                    Recovery subject: OK: {HOSTNAME}: {TRIGGER.NAME}
                    Recovery message: OK: {HOSTNAME}: {TRIGGER.NAME}
                    {TRIGGER.KEY}: {{HOSTNAME}:{TRIGGER.KEY}.last(0)}
                    Status: Enabled

                    Action conditions:
                    Type of calculation: And
                    Conditions: (A) Trigger severity = "Warning"
                    (B) Trigger value = "PROBLEM"

                    Action operations:
                    Send message to Group "Sysadmins"

                    Comment

                    • zordio
                      Junior Member
                      • Mar 2009
                      • 15

                      #11
                      This is interesting. A "high" alert just sent 1 problem email and 6 recovery emails per recipient, from the problem event. It should have sent no recovery emails. The later recovery event sent 1 email per recipient.

                      The only difference between the action for this and for warnings is the severity level.

                      Comment

                      • richlv
                        Senior Member
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Oct 2005
                        • 3112

                        #12
                        hmm. i don't notice any immediate problems, sorry. if this is more or less reliably reproducible, i'd suggest enabling debuglevel4 and looking for any interesting log entries.
                        if you do find something relevant, you could report it as a bug.
                        Zabbix 3.0 Network Monitoring book

                        Comment

                        • zordio
                          Junior Member
                          • Mar 2009
                          • 15

                          #13
                          Thank you, will do.

                          If I wanted to look at the code, where would you suggest I start?

                          Comment

                          • richlv
                            Senior Member
                            Zabbix Certified Trainer
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Oct 2005
                            • 3112

                            #14
                            sorry, i don't code
                            given that such a problem isn't too widespread, maybe some weird issue of retries and e-mail server not responding as expected...
                            Zabbix 3.0 Network Monitoring book

                            Comment

                            • andresherrera
                              Junior Member
                              • Jan 2006
                              • 4

                              #15
                              Similar problem here

                              Hi,

                              I just installed the version 1.6.4 to monitored a some servers.

                              Today i receive a lot of messages from a singular event.

                              I query directly to the database and not found any irregular, only a lot of alerts for the same event on a few seconds

                              I don't know where more search about this. any help is welcome


                              There is the info that i found on the database and the settings:



                              All alerts have the following info:
                              actionid=4
                              eventid=7459
                              userid=4
                              mediatypeid=1
                              sendto=MYEMAIL
                              subject=Warning PROBLEM DEV Performance load is > 5 (5min)
                              message=Warning PROBLEM DEV Performance load is > 5 (5min)
                              status=1
                              retries=0
                              error=
                              nextcheck=0
                              esc_step=0
                              alerttype=0
                              clock=1243307676

                              The only change between the records is the clock that pass from 1243307676 to 1243307679. (and obviously the auto-increment alertid)

                              ACTION SETTINGS:
                              Subject and body: {TRIGGER.SEVERITY} {STATUS} {HOSTNAME} {TRIGGER.NAME}
                              Recovery message: checked
                              Conditions: Trigger severity >= "Warning"
                              Action operations: Send message to Group "First level support" (default message)

                              TRIGGER SETTINGS:
                              Name: Performance load is > $1 (5min)
                              Expression: {DEV:system.cpu.load[,avg5]. last(0)}>5
                              No dependencies defined
                              Event generation: Normal
                              Severity: Warning

                              Comment

                              Working...