Ad Widget
Collapse
Many recovery notifications for a single problem notification
Collapse
X
-
-
I finally saw the problem happen and captured the logs. I had to increase the log size from 1MB to 100MB, and even then, with debuglevel 4 I had to be quick.
So, I see in the logs that the event that sent 25 recovery messages does have 25 entries in the log. I'm not sure what might be relevant to pinpointing the problem, though. And even just those couple of minutes, the log would be fairly large.
Suggestions would be greatly appreciated.Comment
-
Since there was no response to richlv suggestion (add 'trigger value = problem' to Conditions), have you tried it ?
Without it, "action" is repeated indefinitely when trigger value becomes OK.
AlixenComment
-
Tried it, no luck. I usually read a thread before posting
But this solved my first problem (receiving only the first message and not the recovery)
Also, it seems it is kinda random and affect some machines and not some others.
Some i receive only one alert and several recovery, or one alert and one recovery, or one alert and two recovery message.
Also the multiple recovery messages have the same timestamp.Last edited by melpheos; 06-08-2009, 08:56.Comment
-
so you previously had escalations enabled without trigger value=problem ?
vague guess - maybe now database contains some escalations that won't end. you _might_ try cleaning the relevant table[s], but i don't know details, thus it's your responsibility
oh, backup the database and i didn't suggest this =)Comment
-
We have had to turn off escalations because zabbix will decide it really needs to tell you about some event even long after it has sent the OK. trigger value = problem doesn't seem to help.
Zabbix will go nuts and page as fast as it can even though there are no active events. If you ack all the events (including the OK's) it seems to shut it up.
the alert logic seems to be very broken. Which rules actually work in combo with others is kind of a guess and hope. We can't get alerts to filter on template no matter what we do.Last edited by garumph; 10-08-2009, 20:01.Comment
-
The problem begins from activate escalations in action without Trigger value = "PROBLEM", then update it with Trigger value = "PROBLEM", or other update reproduce many recovery notify.
Problem solve:
I'm delete action, and create same new with Trigger value = "PROBLEM"Comment
-
Comment
-
Hello,
Im also experienceing this problem, with 1.6.6.
it starts to happen when i ADD the "trigger value = problem" condition (which i do to prevent that mails are sent AFTER the trigger is ok again)
Unfortunately, this bug appeared just after i added my boss's sms to the receiver list :-P
Here is the action:

What I did:
1) Created action with escallation and two steps like shown but without the condition.
2) made a trigger fire and waited until all steps had happened
The "problem" event:

The "ok" event:

So far all ok
3) I added the "problem" condition
4) I made the triffer fire again, but right after zabbix noticed, i chagned the service back to ok.
So no "failure" actions were triggered. (no actions found)
5) The "ok" event sent many recoveries:

This time "only" 48 messages (during 3 seconds)
When i first noticed this problem 1412 (!) actions sent.. emails, sms' ...
What could be interesting more to know?
* In the first case (with 1412 msgs) the users had more than one mail connected with different working hours
* In this last case, only one user and active email address, but with "1-5,00:00-08:00;1-5,20:00-23:59;" as working hours...
Not sure what more info to provide at this point.. I dont have time now, but probably i could do a test next week with logs activated..Comment
Comment