Ad Widget

**Aly** · 30-12-2008, 14:03

I think you need to add condition like: Trigger value = "PROBLEM"
Also which version you're using for tests?

**geno** · 06-01-2009, 14:00

Another img

Here's an image that explains the time line of events as described in the "Recovery MEssages" section of my previous post...

You'd really like that all the recovery messages (could be just one) be sent at the instant fix was detected, ie 16:00

**geno** · 06-01-2009, 14:14

Using "Trigger Value = OK"

In the 1st post at the end I mentioned another way of doing the escalations; Aly pointed out to use “Trigger value = PROBLEM”. I will try to explain that here and point out the problems with that configuration.

With this setup, you configure the same actions as before except that you DO NOT tick ‘enable escalations’ and you DO NOT tick ‘recovery message’. You will then end up with something that looks like this:

1.0 - Instant Configuration

Img 1.0

In Img 1.0 - This action will trigger at the instant the problem is detected, ie “Trigger Value = PROBLEM”. It has a few other conditions to define what to trigger on, and it will send a message to 1st standby email and cell and it will email 2nd standby.

2.0 – Escalation 30 min Configuration

Img 2.0

This is setup with ‘enable escalations’ ticked, and the period set at 1800sec (30 min). The steps to notify on is 2, which means, not at the instant but at step 2 which is 30 min from the instant of the problem; which is also why trigger value is set to “PROBLEM”.

3.0– Escalation 60 min Configuration

Img 3.0

This is setup with ‘enable escalations’ ticked, and the period set at 3600sec (60 min). The steps to notify on is 2, which means, not at the instant (step 1) but at step 2 which is 60 min from the instant of the problem; which is also why trigger value is set to “PROBLEM”.

4.0 – Recovery messages

Img 4.0

At the instant it is detected that the problem was fixed (therefor ‘enable escalations’ not ticked) and “Trigger Value = OK” send a message to all relevant people.

--
This works fine. Except for the following:

Let’s say the problem was detected and the notification was sent at the instant (lets say 15:00). 1st standby then fixes the problem within 5 minutes, ie the 30 min and 60 min escalation notifications doesn’t occur. Then zabbix detects the problem was fixed but because of this setup sends recovered messages to all three groups.

That is because you cannot determine who received messages and who didn’t.

A POSSIBLE SOLUTION:
If you knew how long the problem existed you could change; for example:

If [trigger value = OK] and [event.age < 30] -> send recovery msg to only 1st standby
If [trigger value = OK] and [event.age < 60] -> send recovery msg to only 2nd standby
If [trigger value = OK] and [event.age > 60] -> send recovery msg to only 1 HR standby (manager)

You can do this because you know that if the problem existed for less than 30 minutes, ONLY 1st standby got the error notification and only he needs to receive that recovery notification (you don’t want to waste time of the 2nd standby or get the manager worried for something that was fixed quickly).

The same story for if the problem was less than 60 minutes, only 1st and 2nd standby…

--
Am I missing the plot completely? Am I making this too complicated?

Thanks

**geno** · 06-01-2009, 17:37

Eureka!!

Eureka!!

Aly, you have said what many has said before. And maybe I wasn't listening properly...

I’m not sure whether I’ve just been blind or stupid or if my intuition lacks some deeper insight or what (?) but I’ve found the solution. Looking at it seems almost simple, except for the “Trigger value = PROBLEM” part which just doesn’t seem to fit completely. It also begs me to ask what’s the purpose of allowing the user to tick “Recovery message” if he doesn’t add “Trigger value = PROBLEM”? I ask this because, if you remove “Trigger value = PROBLEM” from the below scenario, you can still tick “Recovery message” but it doesn’t work, ie, it doesn't send a recovery message…??? (I'll check again)

Solution

1) Enable Esclations
2) Set the period between escalations (300sec in this example)
3) Set default subject and message
4) Enable ‘Recovery Message’
5) Set recovery subject and message. I like to put ‘RECOVERY’ in the subject then it’s nice and clear.
6) Set the conditions and then add “Trigger value = PROBLEM”. [YOU MUST ADD THIS ONE]
7) Set the operations for each step, in this case a different person will be emailed every 300sec until jthut was mailed no more will happen. [NOTE: You can modify the default delay between escalations, see the first post by me in this thread.]

What this will cause I will explain with examples:

Example 1

15:00 – Zabbix trigger picks up problem condition on host ‘farfaraway’. It will immediately email ‘lskywalker’

For this example, let’s say Mr ‘lskywalker’ fix the problem within a minute (our hero!)
Assume this item is checked every 60 seconds, then at around:

+- 15:03 – Zabbix trigger picks up the problem was fixed on host ‘farfaraway’. It will immediately email ‘lskywalker’. (IMPORTANT: It has only notified ‘lskywalker’ of the problem, thus will only notify ‘lskywalker’ that the problem was fixed.)

Example 2
(adaptation of Example 1)

15:00 – Zabbix trigger picks up problem condition on host ‘farfaraway’. It will immediately email ‘lskywalker’

For this example, let’s say Mr ‘lskywalker’ was playing with his light-saber and doesn’t notice the email. So:

15:05 – Zabbix escalates (300sec later) and sends for step 2 a message to Mr ‘yoda’

Mr ‘yoda’ is of course working hard and fixes the problem immediately (fixes the problem he can!) so at around:

+- 15:07 - Zabbix trigger picks up the problem was fixed on host ‘farfaraway’. It will immediately email ‘lskywalker’ and it will immediately email ‘yoda’. (IMPORTANT: They were the only ones notified of the problem and should/is be the only ones to be notified of the recovery.)

--
The above example in a timeline

--

You get the idea. This is important, because as you’ve seen in my previous posts, you don’t want to send out unnecessary notifications (I was having problems with recovery messages being sent to people that didn’t even need to know the problem occurred, such as the 3rd escalation person, who might just be your manager/boss). Especially if it’s a SMS/Cellphone Text message. ESPECIALLY if you sometimes have problems at 3am in the morning. ESPECIALLY if the morning is xmas morning and ‘dvader’ (your boss) is sleeping…

--

Any comments would be appreciated

**geno** · 06-01-2009, 18:47

Double checked

I double checked

Okay, so for some reason having Trigger Value = PROBLEM is what makes the difference between these two scenarios:

With 'Trigger Value = PROBLEM'

--

WITHOUT 'Trigger Value = PROBLEM'

--
I think that my confusion is justified, it doesn't really make sense. These options doesn't come intuitively, so by playing around the Actions configuration you won't necessarily just 'get it'.

Atleast I got it figured out now, and hopefully this post will clear it up for other people also.

**Aly** · 10-01-2009, 11:24

I'm glad you have figured it out. Let the force be with U

**hml** · 28-01-2009, 02:56

Zabbix 1.6.2 Period value in steps

Hi All,

I have just installed 1.6.2 in a test environment. I have managed to find out how the escalations are working and I have to say it is a big improvement from 1.4.6 actions.

However I have a question about the period value in the steps screen. I have tried putting different values in each step but cannot see any impact to the time line of escalations/notifications for the problem or recovery.

Is the period value in a step configuration suppose to change the time line of escalations?

Regards,

hml

**geno** · 29-01-2009, 00:55

i have not upgraded to the 1.6.2 so ito that i cannot give you a proper answer. however, if you look at example 3 in the first post of this thread, you will see what the period value does... i'm not sure how to explain it any more clear than that?

**hml** · 03-02-2009, 02:44

Escalations in 1.6.2

Hi Geno,

I have run some more tests where I put values for default period which differs enough from the periods in the steps and also used SMS for more accurate timing.

This confirmed that escalations work as expected. My misunderstanding was about what period in the step configuration was.

It would have been a bit easier to understand if for example instead of "Period [0-Default]" was saying "Run next step in: ....[0=Default]" and on the action screen maybe period was called "Default Period".

The other factor that contributes to the confusion is that the values in the Delay column are only calculated using the default value and do not adjusted when it is overwritten in the step configuration.

To summarise, escalations work OK.

hml

**geno** · 03-02-2009, 15:18

You are correct. I was just reporting on how I found the escalations to work, not what is most understandable

You could suggest your improved interface to the developers

**consultorpc** · 27-03-2009, 03:40

Hello,

Can you please help me to configure an action which need to be repeated for every 30 minutes? Is it possible with a single operations action with escalation can be done this? I have already posted a thread : http://www.zabbix.com/forum/showthread.php?t=12045 , this will give you more details about the question.

Thanks

consultropc

**Justin Freeman** · 18-04-2009, 07:20

Please add these examples to the Zabbix Manual

Please add these examples to the Zabbix Manual. They are excellent and helped me understand how to use actions more effectively.

Thanks to everyone in the thread for their efforts

**consultorpc** · 20-04-2009, 03:24

Justin,

Since you got a better understanding how to configure this, please tell me how I can make an action which need to be repeated for every 30 minutes?

Thanks

consultorpc

**bernard** · 30-06-2009, 08:31

Best post about escalation

Hi Geno,

This is the best post about escalation !!! It should be always on top

or into the manual.

Tank you,
bernard

Ad Widget

Escalations Explained (RFC)

Escalations Explained (RFC)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment