Ad Widget

Collapse

Wrong {STATUS} with escalation ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Calimero
    Senior Member
    • Nov 2006
    • 481

    #1

    Wrong {STATUS} with escalation ?

    Hi,

    I've configured an action with escalation enabled, attached to a host.

    After having some trouble at first (mail flood as another zabbix user reported), escalation is working fine.

    I've got a first alert message, then (10 mins later) there's escalation with email notification (custom msg) every 5 mins unless acknowledged.

    The only problem is that the recovery message has trigger status = on instead of 'off'

    I've then configured a simple action without escalation, for the same host.
    I get emails for problems/recovery through that action and status is fine: ON when trigger detects a problem, OFF when problem is recovered.

    Another "interesting" aspect is that with the simple action, although I check "Recovery message" and specify a custom recovery message, default message is send for recovery too. (In the provided screenshots I didn't use custom recovery msg for simple action because I discovered the "custom recovery msg" issue only later).

    Attached files:
    action_esc_0.JPG : main configuration of action with escalation
    action_esc_1.JPG : first notification step
    action_esc_2.JPG : second notification step

    action_noesc.JPG : simple action (without custom recovery msg)

    event_recovery_msg.JPG : event log that shows {TRIGGER.STATUS} inconsistency

    Anyone has experienced such behavior ?
    Attached Files
  • Calimero
    Senior Member
    • Nov 2006
    • 481

    #2
    Apart from the "STATUS" issue reported above, a few other strange things happen with escalation.

    I've defined an Action with escalation (see above).

    Problems with web monitoring are simulated using iptables on zabbix server.

    I end up with 'OK' and 'PROBLEM' events:

    2008.Nov.07 11:04:37 web ZZZ Distribution PROBLEM High 8m 9s No Ok
    2008.Nov.07 11:00:16 web ZZZ Vision PROBLEM Disaster 12m 30s No Ok
    2008.Nov.07 10:55:36 web ZZZ Vision OK Disaster 4m 40s No Ok
    2008.Nov.07 10:52:46 web ZZZ Vision PROBLEM Disaster 2m 50s No Ok
    2008.Nov.07 08:45:06 web ZZZ Vision OK Disaster 2h 7m 40s No Ok

    The lastest Events (11:04:37 and 11:00:16) send notifications, with escalation working fine.
    The problem is that events from 10:55:36 (recovery!!) and 10:52:46 (crash) again send notifications, that is escalation notifications. Recovery notifications (Status=OFF) and Problem notifications (Status=ON) respectively.

    Looks like when escalation is enabled, previous events are "brought back from the dead" while they should be left dead. Maybe some conditions missing in a WHERE clause when evaluating events to escalate ?

    Comment

    • Calimero
      Senior Member
      • Nov 2006
      • 481

      #3
      I've configured "infinite" escalation, unless acknowledged.

      Looks like when status changes from PROBLEM to OK, escalation notifications automatically stop. Which is what - I guess - most people expect.

      The only problem is when service is "flapping": zabbix seems to mix up events/escalation and things go bad.

      In my case, triggers are back to OK, new "recovery" events have been created and recovery notifications sent.

      The problem is that I've end up in a situation in which all events (OK, PROBLEM) keep sending emails.
      I've already got tens of notifications (both 'ON' and 'OFF' at the same time)...

      Comment

      • Calimero
        Senior Member
        • Nov 2006
        • 481

        #4
        Well actually, in my setup, "infinite notifications until acknowledged", notifications are really sent forever, whatever the status... unless you acknowledge.
        It actually has nothing to do with checks "flapping".

        Thats is, if a trigger goes from PROBLEM back to OK, escalation associated with the 'PROBLEM' event will keep sending notifications forever, until I acknowledge the 'PROBLEM' event.
        The fact that there's an 'OK' event following does not stop escalation.

        The same goes for 'OK' events that will keep flooding until you acknowledge the event...

        I actually expected a status change to stop escalation, whatever the definition of the action is.

        So in my setup, you have to acknowledge any event if you don't want to fill your inbox/phone with notifications.


        Is it possible to have something like:
        PROBLEM:
        - send immediate notification (Step 1 to 1, default period, send default message to target)
        - after 10 minutes, send notification every 5 minutes with custom message, unless acknlowedged or trigger goes back to OK. I've tried: Step 2-0 (infinity), 300sec delay, send custom message to target, condition: Event acknowledged = "Not Ack")

        OK:
        - send immediate single recovery notification (no escalation, no need to acknowledge)

        So far I haven't managed to get such setup.
        Last edited by Calimero; 07-11-2008, 17:31. Reason: missing condition

        Comment

        • Calimero
          Senior Member
          • Nov 2006
          • 481

          #5
          I've upgraded to 1.6.1. After messing a bit more with Action settings (maybe to reset some status fields in DB ?) things seem to work much better.

          I'm going to "stress test" my settings a bit more anyway. I don't want zabbix to go crazy one night. My colleagues would probably hate me for that.

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            You may go straight to pre-1.6.2 available from www.zabbix.com/developers.php. It has the wrong {STATUS} value fixed.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            Working...