Ad Widget

Collapse

Many recovery notifications for a single problem notification

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • richlv
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2005
    • 3112

    #16
    based on the discussion today, i think your problem is the following
    Originally posted by andresherrera
    ACTION SETTINGS:
    ...
    Recovery message: checked
    Conditions: Trigger severity >= "Warning"
    you should also add 'trigger value = problem' (or similar condition)
    Zabbix 3.0 Network Monitoring book

    Comment

    • zordio
      Junior Member
      • Mar 2009
      • 15

      #17
      I finally saw the problem happen and captured the logs. I had to increase the log size from 1MB to 100MB, and even then, with debuglevel 4 I had to be quick.

      So, I see in the logs that the event that sent 25 recovery messages does have 25 entries in the log. I'm not sure what might be relevant to pinpointing the problem, though. And even just those couple of minutes, the log would be fairly large.

      Suggestions would be greatly appreciated.

      Comment

      • melpheos
        Member
        • Dec 2008
        • 64

        #18
        Hi everyone, i have exactly the same problem...

        It seems there is no update on this issue so i'm just bumping the thread ^^

        Comment

        • alixen
          Senior Member
          • Apr 2006
          • 474

          #19
          Since there was no response to richlv suggestion (add 'trigger value = problem' to Conditions), have you tried it ?

          Without it, "action" is repeated indefinitely when trigger value becomes OK.

          Alixen
          http://www.alixen.fr/zabbix.html

          Comment

          • Saftnase
            Member
            • Jul 2006
            • 30

            #20
            Maybe he didn't try the trigger vlaue=problem, but i did....
            and had 75 recovery sms on my handy

            Comment

            • melpheos
              Member
              • Dec 2008
              • 64

              #21
              Originally posted by alixen
              Since there was no response to richlv suggestion (add 'trigger value = problem' to Conditions), have you tried it ?

              Without it, "action" is repeated indefinitely when trigger value becomes OK.

              Alixen
              Tried it, no luck. I usually read a thread before posting
              But this solved my first problem (receiving only the first message and not the recovery)

              Also, it seems it is kinda random and affect some machines and not some others.
              Some i receive only one alert and several recovery, or one alert and one recovery, or one alert and two recovery message.

              Also the multiple recovery messages have the same timestamp.
              Last edited by melpheos; 06-08-2009, 08:56.

              Comment

              • richlv
                Senior Member
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Oct 2005
                • 3112

                #22
                so you previously had escalations enabled without trigger value=problem ?
                vague guess - maybe now database contains some escalations that won't end. you _might_ try cleaning the relevant table[s], but i don't know details, thus it's your responsibility

                oh, backup the database and i didn't suggest this =)
                Zabbix 3.0 Network Monitoring book

                Comment

                • zordio
                  Junior Member
                  • Mar 2009
                  • 15

                  #23
                  One thing I've noticed is that it will usually happen for a given host once. Thereafter, there is only a single recovery notification. I'm not sure, though, if this is just per host, or per host per trigger, per host per action, or other combination.

                  Comment

                  • garumph
                    Junior Member
                    • Jun 2008
                    • 7

                    #24
                    We have had to turn off escalations because zabbix will decide it really needs to tell you about some event even long after it has sent the OK. trigger value = problem doesn't seem to help.

                    Zabbix will go nuts and page as fast as it can even though there are no active events. If you ack all the events (including the OK's) it seems to shut it up.

                    the alert logic seems to be very broken. Which rules actually work in combo with others is kind of a guess and hope. We can't get alerts to filter on template no matter what we do.
                    Last edited by garumph; 10-08-2009, 20:01.

                    Comment

                    • Draal
                      Junior Member
                      • Aug 2009
                      • 2

                      #25
                      The problem begins from activate escalations in action without Trigger value = "PROBLEM", then update it with Trigger value = "PROBLEM", or other update reproduce many recovery notify.

                      Problem solve:
                      I'm delete action, and create same new with Trigger value = "PROBLEM"

                      Comment

                      • zordio
                        Junior Member
                        • Mar 2009
                        • 15

                        #26
                        This will happen even if you have 'trigger value = PROBLEM' as one of the conditions.

                        Comment

                        • Draal
                          Junior Member
                          • Aug 2009
                          • 2

                          #27
                          May be.
                          but delete action and add new with condition, solve problem.

                          Comment

                          • richlv
                            Senior Member
                            Zabbix Certified Trainer
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Oct 2005
                            • 3112

                            #28
                            Originally posted by zordio
                            This will happen even if you have 'trigger value = PROBLEM' as one of the conditions.
                            could it be that the condition was added later, when the undesired escalation had already started ?
                            Zabbix 3.0 Network Monitoring book

                            Comment

                            • Brismtedt
                              Junior Member
                              • Jul 2009
                              • 22

                              #29
                              Hello,
                              Im also experienceing this problem, with 1.6.6.
                              it starts to happen when i ADD the "trigger value = problem" condition (which i do to prevent that mails are sent AFTER the trigger is ok again)

                              Unfortunately, this bug appeared just after i added my boss's sms to the receiver list :-P

                              Here is the action:


                              What I did:
                              1) Created action with escallation and two steps like shown but without the condition.
                              2) made a trigger fire and waited until all steps had happened
                              The "problem" event:


                              The "ok" event:


                              So far all ok

                              3) I added the "problem" condition

                              4) I made the triffer fire again, but right after zabbix noticed, i chagned the service back to ok.
                              So no "failure" actions were triggered. (no actions found)

                              5) The "ok" event sent many recoveries:


                              This time "only" 48 messages (during 3 seconds)

                              When i first noticed this problem 1412 (!) actions sent.. emails, sms' ...

                              What could be interesting more to know?
                              * In the first case (with 1412 msgs) the users had more than one mail connected with different working hours
                              * In this last case, only one user and active email address, but with "1-5,00:00-08:00;1-5,20:00-23:59;" as working hours...

                              Not sure what more info to provide at this point.. I dont have time now, but probably i could do a test next week with logs activated..

                              Comment

                              • Brismtedt
                                Junior Member
                                • Jul 2009
                                • 22

                                #30
                                i tried today with loglevel = 4 and of course it did not happen :-P

                                Seems like possibly it's correct like a previous poster said: that it only happens the first time...

                                /B

                                Comment

                                Working...