Ad Widget

Collapse

Zabbix 5.0 Escalations are not working for problems that start during maintenance

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • JohnBelliveau
    Junior Member
    • Mar 2021
    • 5

    #1

    Zabbix 5.0 Escalations are not working for problems that start during maintenance

    Hi,

    Escalations and Action Operations are not working for problems that start during a maintenance window and are not resolved after the maintenance period ends.
    Action Conditions include "Problem is not suppressed"
    Default operation step duration is 5 min.
    Items are scheduled to be checked every 5 min
    Operation steps are 1 - 0

    I am able to see from the Latest Data that the item is checked on schedule as expected.
    After the maintenance window for a host ends, the Item is displayed on the Problems screen as expected.
    However, the 'Actions' column on the Problems screen for that item doesn't show anything, nor is anything added to the Action log.

    Any ideas on troubleshooting would be appreciated!
  • Answer selected by JohnBelliveau at 26-08-2022, 15:02.
    JohnBelliveau
    Junior Member
    • Mar 2021
    • 5

    tim.mooney appreciate your feedback. It turns out that the Action setting "Pause operations for suppressed problems" in combination with the Action Condition "Problem is not suppressed" was the culprit. Seems that the "Pause operations for suppressed problems" does more than what the label implies if that Condition is also on the action.
    Even with operation steps set to "1 - 0", the operations were not firing until that setting was removed, unless the Condition is removed from the action - in which case it seems to work as documented.

    Which means that the only way for escalations to work properly is to not have the "Problem is not suppressed" condition. Otherwise, in order to be sure that Zabbix will fire an action operation after a host comes out of a maintenance window, "nag" mode (operation steps "1 - 0") must be set AND the "Pause operations for suppressed problems" unchecked. Otherwise, if the length of the maintenance window exceeds the step count * step duration time, the escalation might expire during the maintenance window, and notification won't happen.
    Last edited by JohnBelliveau; 24-08-2022, 22:27.

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #2
      I had this *exact* same problem.

      Did your original install/setup of Zabbix start with version 3.0 or earlier? If it did, even though you're now at version 5.0, there's a section of the 3.2.0 upgrade notes that you may have missed: https://www.zabbix.com/documentation...rade_notes_320

      Look specifically at the "Escalation changes" of those upgrade notes and what it recommends, and make sure that your action conditions match those recommendations.

      Comment

      • JohnBelliveau
        Junior Member
        • Mar 2021
        • 5

        #3
        Thank you Tim. We started with Zabbix 5.0. Nonetheless, I did review the upgrade notes - however, this did not apply to my use-case. What makes this tough to troubleshoot is not that escalation stops at a certain time per se; it's that the action doesn't fire at all once the host comes out of maintenance mode.

        Comment

        • JohnBelliveau
          Junior Member
          • Mar 2021
          • 5

          #4
          tim.mooney appreciate your feedback. It turns out that the Action setting "Pause operations for suppressed problems" in combination with the Action Condition "Problem is not suppressed" was the culprit. Seems that the "Pause operations for suppressed problems" does more than what the label implies if that Condition is also on the action.
          Even with operation steps set to "1 - 0", the operations were not firing until that setting was removed, unless the Condition is removed from the action - in which case it seems to work as documented.

          Which means that the only way for escalations to work properly is to not have the "Problem is not suppressed" condition. Otherwise, in order to be sure that Zabbix will fire an action operation after a host comes out of a maintenance window, "nag" mode (operation steps "1 - 0") must be set AND the "Pause operations for suppressed problems" unchecked. Otherwise, if the length of the maintenance window exceeds the step count * step duration time, the escalation might expire during the maintenance window, and notification won't happen.
          Last edited by JohnBelliveau; 24-08-2022, 22:27.

          Comment

          • tim.mooney
            Senior Member
            • Dec 2012
            • 1427

            #5
            Good find, and thanks for following up and confirming it was related to the behavior that started with the escalation changes at 3.2.0. Your fix was the exact same one I had to make at my work site: remove "Problem is not suppressed" (which isn't what it was originally called) from all action conditions, in favor of the "new" checkbox for "Pause operations for suppressed problems".

            It seems like the "Problem is not suppressed" action condition is now just a trap for the unwary. It's not clear to me when it would ever be a good idea to use it.

            Comment

            • JohnBelliveau
              Junior Member
              • Mar 2021
              • 5

              #6
              Doreen65463 basically, the issue was that Actions and Escalations https://www.zabbix.com/documentation...n%2Cescalation were not working in the below scenario:
              1. Host was in maintenance window when problem started
              2. Maintenance window expired, but problem was not resolved or acknowledged

              Comment

              Working...