21 Escalations and repeated notifications

1 Overview

Zabbix provides effective and extremely flexible functionality for escalations and repeated notifications. Depending on configuration, Zabbix will automatically escalate (increase escalation step) unresolved problems and execute actions assigned to each escalation step.

Zabbix supports the following scenarios for escalations, notifications and remote commands:

  • Immediately inform users about new problems
  • Pro-active monitoring, Zabbix executes arbitrary scripts (remote commands)
  • Repeated notifications until problem is resolved
  • Delayed notifications and remote commands
  • Escalate problems to other user groups
  • Different escalation path for acknowledged and unacknowledged problems
  • Execute actions (both notifications and remote commands) if a problem exists for more than N hours (seconds, minutes, etc).
  • Recovery message to all interested parties
  • Zabbix supports unlimited number of escalation steps
2009/08/13 13:03

2 Simple messages

Warning: before enabling recovery messages or escalations, make sure to add “Trigger value = PROBLEM” condition to the action, otherwise remedy events can become escalated as well.

In order to alert MySQL Administrators about any issues with MySQL applications the following configuration can be used:

Since we are not interested in sending multiple messages or escalating MySQL problems to other user groups, escalations are not enabled.

Zabbix will send a single message to MySQL Administrators and a recovery message when problem is resolved. If sending of recovery messages is not enabled, Zabbix will send only one message with information about new problem, no messages will be sent on recovery, i.e. when the problem is resolved.

Action conditions is defined so that it will be activated in case of any problem with any of MySQL applications.

Note also use of macros in the messages. Zabbix supports wide range of macros. Complete list of macros is available here: macros

Actions are defined as:

A message will be sent to all members of the group MySQL Administrators.

2009/11/22 18:24

3 Remote commands

Remote commands is a powerful mechanism for smart pro-active monitoring. Zabbix can execute a command on a monitored host in case of any pre-defined conditions.

Here is the list of some of the most obvious uses of the feature:

  • Automatically restart application (WEB server, middleware, CRM) if it does not respond
  • Using IPMI 'reboot' command reboot remote server if it does not answer requests
  • Try to automatically free disk space (remove older files, clean /tmp) if we are running out of disk space
  • Migrate one VM from one physical box to another depending on CPU load
  • Add new nodes to the cloud environment if we have insufficient CPU (disk, memory, whatever) resources

Configuration of action for remote commands is similar to messaging, the only difference is that Zabbix will execute a command instead of sending a message.

The action condition is defined so that it will be activated in case of any disaster problems with one of Apache applications.

As a reaction to the disaster problem Zabbix will try to restart Apache process:

Note use of the macro {HOSTNAME} here.

User 'zabbix' must have enough permissions to execute this script. Also Zabbix agent should run on a remote host and accept incoming connections. Remote commands are disabled by default and can be enabled in Zabbix agent daemon configuration file on Unix-like or Windows systems.
Remote commands do not work with active Zabbix agents.

See remote command tutorial for more information.

2009/11/22 19:10

4 Repeated notifications

Repeated notifications is probably one of the most common use of Zabbix escalations.

Make sure that escalations are enabled in the action details:

The period defines how frequently Zabbix should increase escalation step. By default, it goes to the next step every hour, i.e. 3600 seconds.

As soon as we enabled escalations, actions operations get additional options: Step(s), Period and Conditions.

Suppose we would like to send 5 messages every hour, so we defined that the operation will be active from escalation step 1 till 5. The escalation period will be taken from action definition unless we overwrote it for an individual operation.

As soon as we have a problem, Zabbix is at step 1, so all operations assigned to the step will be executed. After one hour, escalation period will be increased automatically (if the problem still exists obviously), so all operations of step 2 will be execute. And so on.

A recovery message will be sent only to those people who received at least one message before in scope of the escalation.

If the trigger that generated an active escalation is disabled, Zabbix sends a message informing about this fact to persons that have already received notifications.
2009/11/22 19:27

5 Delayed notifications

Zabbix escalations supports sending of delayed notifications.

Suppose we would like to be notified about long-standing MySQL problems only. Note that the escalation period was changed to 10 hours and we use a custom default message:

The operation is assigned only to step 2. It means it will be executed once after one escalation period, i.e. 10 hours:

Therefore user 'Alexei' will get a message only in case if a problem exists for more than 10 hours. The notification delay is controlled by the escalation period.

2009/11/22 20:09

6 Escalate to Boss

Zabbix escalations can be used to escalate problem to other users and user groups. Problem is not being fixed by MySQL admins? Escalate to their BOSS!

Now we configured periodical sending of messages to MySQL administrators. The administrators will get four messages before the problem will be escalated to the Database manager. Note that the manager will get a message only in case if the problem is not acknowledged yet, supposedly no one is working on it.

Note use of the {ESC.HISTORY} macros in the message. The macro will contain information about all previously executed steps. The manager will get information about all email and all action executed before. MySQL administrators, beware!

2009/11/22 20:48

7 Complex scenario

Look at this set of actions. After multiple messages to MySQL administrators and escalation to the manager, Zabbix will try to restart the MySQL database. It will happen if problem exists for 2:30 hours and it hasn't been acknowledged.

If the problems still exists, after another 30 minutes Zabbix will send a message to all users in Japan.

If this does help, after another hour Zabbix will reboot server with the MySQL database (second remote command) using IPMI commands.

2009/11/22 21:10