Ad Widget

**ISiroshtan** · 02-01-2025, 01:22

I personally would go with:
- slap some unique tag on triggers you fighting with
- creating maintenance(s) (one per host you fighting with or in groups based on when you expect jobs to be running)

- with data collection
- active for 1-2 year
- recurring period (daily-weekly-monthly, based on need)
- time matching knowin jobs + 10min on top of that
- tag match to choose only specific trigger to be under maintenance

Should cover your need I think

**markfree** · 02-01-2025, 03:36

You may be looking for a less sensitive trigger function.
Trend and baseline functions can help you with that.

Example

**jhboricua** · 09-01-2025, 05:33

Originally posted by ISiroshtan

I personally would go with:
- slap some unique tag on triggers you fighting with
- creating maintenance(s) (one per host you fighting with or in groups based on when you expect jobs to be running)

- with data collection
- active for 1-2 year
- recurring period (daily-weekly-monthly, based on need)
- time matching knowin jobs + 10min on top of that
- tag match to choose only specific trigger to be under maintenance

Should cover your need I think

This becomes unmanageable very quickly and doesn't scale when there are hundreds of servers.

markfree I've been looking at the trends and baseline functions but have yet to see how they can be used effectively for this. Take the baseline function for example, from the documentation.

baselinedev(/host/key,data period:time shift,season unit,num seasons):
baselinedev(/host/key,1h:now/h,"d",10) #calculating the number of standard deviations (population) between the previous hour and the same hours over the period of ten days before yesterday

baselinewma(/host/key,data period:time shift,season unit,num seasons):
baselinewma(/host/key,1h:now/h,"d",3) #calculating the baseline based on the last full hour within a 3-day period that ended yesterday. If "now" is Monday 13:30, the data for 12:00-12:59 on Friday, Saturday, and Sunday will be analyzed

What I gather by this is that baseline and trends are always looking at previous data to perform their calculations. Which makes sense because trends data is written hourly and trends for the current hour of activity are unavailable for the functions to use. Hence, If my activity spike always happen between 10 and 11 PM, the function above is going to look at the period between 9 to 10 PM. And that's a totally different activity profile. So I'm not sure how I can utilize these functions for what I'm trying to achieve, which is for Zabbix to trigger if the activity spike is normal for that time period given the data for the same time period during the last week(s) or month. It doesn't seem these functions are aimed to do that. Even watching some of the videos from Zabbix on the subject you could tell they were struggling a little bit to explain them, lol.

Or maybe I'm just misunderstanding the use of the baseline/trend functions, which is why I'm asking here for smarter people to point me in the right direction.

**Brambo** · 09-01-2025, 09:08

I think the trendavg function combined with a duration in 1 trigger should cover most of your needs.
Create the trigger with macro's form trend time and duration so that default from template the match the expected scenario's but then you can make it a host macro when a specific host needs it's own "special setting"
in other words: (example)
trendavg(/host/key,1h:now/h) > trendavg(/host/key,6h:now/h-6h) + {$macrovalue for your expected increase}
and
last(/host/key) > {$macro a certain minimum value to pass}

**jhboricua** · 17-01-2025, 02:02

Looking at baselinewma again and it just hit me, is the solution as simple as using a time shift for 'current hour period' instead of last hour? Unlike baselinedev, it seems that baselinewma is not using that last hour value in the baseline calculation/analysis. If I'm reading it correctly, it is simply using the time shift as a reference of which values to use in the calculation based on the season unit and num season parameters. So If I setup the baselinewma function as:

"baselinewma(/host/key,1h:now/h+1h,"d",7)" or "calculate the baseline based on the current hour within a 7-day period that ended yesterday. If "now" is Monday 13:30, the data for 13:00-13:59 for the previous 7 days will be analyzed".

I could then use that in a trigger to compare the current value of say, cpu utilization, against the calculated baseline for that same 1h time period on the last X amount of days and define my threshold in which to trigger at. For example:

Code:

min(/host/system.cpu.util,15m)>90 and baselinewma(/host/key,1h:now/h+1h,"d",7)*2 < avg(/host/system.cpu.util,15m)

Trigger if the min value of CPU utilization exceeds 90% over a 15-minute period and the 15m average CPU utilization is 2 times higher than the calculated baseline for the current 1h period of the last 7 days.

Thoughts?

**markfree** · 18-01-2025, 03:59

You know... I see very few people actually interested in improving their metrics. Cheer mate.

Your scenario is actually very common, but can lead to some misinterpretations.
Keep in mind that when you use "baselinewma" like that you are actually evaluating the current hour, not the last 60m. So, the beginning of each hour will have less data to compare with previous seasons and this can lead to false triggers.

When combining functions, it is best to approximate the data/time scales in them. Something like "avg(1h) > baselinewma(1h) * 2".

You could try other ways to infer abnormal behavior for your data. There are other more statistical ways to measure the data behavior, but baseline seems more straightforward to me.

Still, you might want to try something a little simpler. How about this?

Code:

avg(//key,1h) / avg(//key,1h:now-1d) > 1,5

**jhboricua** · 18-01-2025, 18:29

Originally posted by markfree

Keep in mind that when you use "baselinewma" like that you are actually evaluating the current hour, not the last 60m. So, the beginning of each hour will have less data to compare with previous seasons and this can lead to false triggers.

Code:

avg(//key,1h) / avg(//key,1h:now-1d) > 1,5[SIZE=16px][/SIZE]

Is it actually evaluating the current hour values (or last hour if I were to use 1h:now/h) as part of the baselinewma calculation though? The language and examples in the documentation on baselinewma state:

Code:

baselinewma(/host/key,1h:now/h,"d",3) #calculating the baseline based on the last full hour within a 3-day period that ended yesterday. If "now" is Monday 13:30, the data for 12:00-12:59 on Friday, Saturday, and Sunday will be analyzed

It says calculating the baseline based on the last full hour within a 3 day period that ended yesterday. It doesn't seem to me that it is actually using the last hour values in that calculation, it is simply using the timeshift to know which past periods/seasons to evaluate.

This is different from the language for baselinedev:

Code:

baselinedev(/host/key,1h:now/h,"d",10) #calculating the number of standard deviations (population) between the previous hour and the same hours over the period of ten days before yesterday

where it clearly states that is is calculating the number of std deviations between the previous hour and the same hours.....

Ad Widget

Looking for suggestions on making triggers a little smarter.

Looking for suggestions on making triggers a little smarter.

Comment

Comment

Comment

Comment

Comment

Comment

Comment