Ad Widget

**Pada** · 10-10-2013, 22:03

Hi,

You have quite a few problems with your trigger expression.

.last(600) evaluates a single data point 10 minutes ago.
proc.num[] isn't CPU LOAD, but it may be related
remove the first {TRIGGER.VALUE}=0, because that would cause the flapping
swap your single CPU_LOAD macro for CPU_LOAD_HIGH and CPU_LOAD_NORMAL macros, so that you don't immediately release the trigger

For example our trigger for high CPU usage is something like the following:

Code:

{host:system.cpu.loadpercentagepercpu.avg5.min(3m)}>95|({TRIGGER.VALUE}=1 & {host:system.cpu.loadpercentagecpu.avg5.max(5m)}>60)

Which means that it will only trigger when the CPU load per core was higher than 95% at every data point for the last 3 minutes.
The trigger would only go into an OK state once the CPU load per core is 60% or lower for every single data point for the last 5 minutes. Please take note that the "system.cpu.loadpercentagecpu.avg5" item is a calculated one that we created from:

Code:

last("system.cpu.load[,avg5]")/last("system.cpu.num")

I hope this helps!

**miguel sauaia** · 14-10-2013, 12:55

Originally posted by Pada

Hi,

You have quite a few problems with your trigger expression.

.last(600) evaluates a single data point 10 minutes ago.
proc.num[] isn't CPU LOAD, but it may be related
remove the first {TRIGGER.VALUE}=0, because that would cause the flapping
swap your single CPU_LOAD macro for CPU_LOAD_HIGH and CPU_LOAD_NORMAL macros, so that you don't immediately release the trigger

For example our trigger for high CPU usage is something like the following:

Code:

{host:system.cpu.loadpercentagepercpu.avg5.min(3m)}>95|({TRIGGER.VALUE}=1 & {host:system.cpu.loadpercentagecpu.avg5.max(5m)}>60)

Which means that it will only trigger when the CPU load per core was higher than 95% at every data point for the last 3 minutes.
The trigger would only go into an OK state once the CPU load per core is 60% or lower for every single data point for the last 5 minutes. Please take note that the "system.cpu.loadpercentagecpu.avg5" item is a calculated one that we created from:

Code:

last("system.cpu.load[,avg5]")/last("system.cpu.num")

I hope this helps!

Tank you Pada!

I have to change this, but "avg.max" don't goes evaluate the maximum value of average of 5 minutes? I want if during 5 minutes the host exceed the threshold "full time", then this will be triggered. There could be Something type "system.cpu.load.last(#5)".

**Pada** · 14-10-2013, 16:10

Look, there are a whole bunch of ways that you can obtain the average values, and even more ways that you can calculate the maximum of those averages.

For instance, Linux exposes 3 average values of the Processor Load: over 1 minute, over 5 minutes and over 15 minutes:

Code:

system.cpu.load[,avg1]
system.cpu.load[,avg5]
system.cpu.load[,avg15]

The average over 1 minute (avg1) will go much more up and down than the average over 15 minutes (avg15).
In our use case, we're not interested in seeing/triggering on momentary spikes on CPU usage, like what avg1 gives you.

In Zabbix you can then fetch those average values at whatever interval you'd like.
Like it would be pointless in monitoring [,avg5] once every 5 minutes and then taking the [,avg5].max(5m), because it would be the same as [,av5].last(0).
It would be more sensible ot monitor [,avg5] every 1 minute and then taking [,avg5].max(5m).

Unfortunately I cannot give you more info than this, because setting up the monitoring and triggers all depend on what you want to see and what you want to get notified on.
I suppose a good starting point would be to monitor [,avg5] every 60 seconds and then triggering on [,avg5].last(0) > {YOUR THRESHOLD}

**miguel sauaia** · 15-10-2013, 12:35

Originally posted by Pada

Look, there are a whole bunch of ways that you can obtain the average values, and even more ways that you can calculate the maximum of those averages.

For instance, Linux exposes 3 average values of the Processor Load: over 1 minute, over 5 minutes and over 15 minutes:

Code:

system.cpu.load[,avg1]
system.cpu.load[,avg5]
system.cpu.load[,avg15]

The average over 1 minute (avg1) will go much more up and down than the average over 15 minutes (avg15).
In our use case, we're not interested in seeing/triggering on momentary spikes on CPU usage, like what avg1 gives you.

In Zabbix you can then fetch those average values at whatever interval you'd like.
Like it would be pointless in monitoring [,avg5] once every 5 minutes and then taking the [,avg5].max(5m), because it would be the same as [,av5].last(0).
It would be more sensible ot monitor [,avg5] every 1 minute and then taking [,avg5].max(5m).

Unfortunately I cannot give you more info than this, because setting up the monitoring and triggers all depend on what you want to see and what you want to get notified on.
I suppose a good starting point would be to monitor [,avg5] every 60 seconds and then triggering on [,avg5].last(0) > {YOUR THRESHOLD}

Thank you Pada!

I set the trigger following their recommendation and it is working well.

**troffasky** · 06-05-2016, 18:32

I have been through the same process as the OP and it seems we've both followed this:

No more flapping. Define triggers the smart way. - Zabbix Blog

http://blog.zabbix.com/no-more-flapping-define-triggers-the-smart-way/1488/

Zabbix trigger expressions provide an incredibly flexible way of defining problem conditions. If you can express your problem using plain English or any other human language, there is a great chance it could be represented using triggers. I’ve noticed that even experienced Zabbix users are not always aware of the true power of triggers. The […]

When I used the example for "CPU load is too high" it flaps at every poll interval. Tweaked it to use Pada's example and it's fine.

Ad Widget

Zabbix trigger are flapping hysteresis dont't work.

Zabbix trigger are flapping hysteresis dont't work.

Comment

Comment

Comment

Comment

Comment