Ad Widget

Collapse

How to make alerts more actionable and useful

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bdobson
    Junior Member
    • Feb 2024
    • 19

    #1

    How to make alerts more actionable and useful

    I am running Zabbix 6.4.13 and getting complaints from the team that alerts can happen that are not actionable. Such as a spike in CPU, or something like that which quickly resolves itself. We switched over from Nagios which had a way in the GUI for alerts to say something like "Check X amount of times over X period of time" before sending out an alert. Insofar as my time with Zabbix I cannot find this option.

    An example, let's say I wanted to change this to only alert if it checks twice:

    HTML Code:
    min(/Windows by Zabbix agent/perf_counter_en["\Processor Information(_total)\% Privileged Time"],5m)>{$CPU.PRIV.CRIT.MAX}
    Or

    Check to see if the time is out of sync over 5 minutes and check twice:

    HTML Code:
    fuzzytime(/Linux by Zabbix agent/system.localtime,{$SYSTEM.FUZZYTIME.MAX})=0
    I can see examples on how to make it check over a period of time but what I'm looking for is how to get checks like these, with an example if at all possible please to not have it send some alerts right way, but to check again X amounts of time.

    Thank you​
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    #2
    Sorry but..
    Code:
    min(/Windows by Zabbix agent/perf_counter_en["\Processor Information(_total)\% Privileged Time"],5m)>{$CPU.PRIV.CRIT.MAX}
    this is already a delayed .. during last 5 minutes lowest value is over threshold (lowest, meaning all others were also over limit). If you measure every minute, that means already 5 times measured over the threshold... You can replace time option with number of checks ... 5m => #5 so now it becomes "during last 5 checks lowest value is over threshold (and all others are also)."
    If it 2 times in a row... increase the timeframe to 10 minutes or 10 checks..

    Comment

    • bdobson
      Junior Member
      • Feb 2024
      • 19

      #3
      Originally posted by cyber
      Sorry but..
      Code:
      min(/Windows by Zabbix agent/perf_counter_en["\Processor Information(_total)\% Privileged Time"],5m)>{$CPU.PRIV.CRIT.MAX}
      this is already a delayed .. during last 5 minutes lowest value is over threshold (lowest, meaning all others were also over limit). If you measure every minute, that means already 5 times measured over the threshold... You can replace time option with number of checks ... 5m => #5 so now it becomes "during last 5 checks lowest value is over threshold (and all others are also)."
      If it 2 times in a row... increase the timeframe to 10 minutes or 10 checks..
      Right, sorry that is correct. Silly example :-(

      Do you have any ideas about the second example example? How do I add the same type of delay properly to that check?

      I tried this example but it didn't work

      HTML Code:
      fuzzytime(/Linux by Zabbix agent/system.localtime,{$SYSTEM.FUZZYTIME.MAX},5m)=0
      Also having an issue on making the default check for services check more than once before it alerts

      HTML Code:
      last(/OS processes by Zabbix agent/custom.proc.num[{#NAME}])=0
      Last edited by bdobson; 10-04-2024, 18:19. Reason: Added examples

      Comment

      • tim.mooney
        Senior Member
        • Dec 2012
        • 1427

        #4
        You probably just need to spend some time looking at the trigger function examples in the documentation. Many of them have explanations for what the trigger is doing: https://www.zabbix.com/documentation...ers/expression

        There's also an expression builder when creating a new trigger, that may help you.

        Many of the functions (but not all...) support an argument that can be either a period of time (e.g. 5m, 30m, etc.) OR a "number of checks". For example, check Example 2, Example 4, and Example 7 from the trigger expression documentation link, above.

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4806

          #5
          last(/host/item)=0 (last value is 0) should probably be replaced with max(/host/item,#X)=0 meaning max value during X checks is 0, which means all of those checks are effectively failed (if 0 is failure)... in some cases min() should be used...
          Fuzzytime unfortunately does not support time or count parameters... It is by nature comparing 2 timestamps... one from host, one from server, if those are off, it fires... if you do not have time synchronization in place I am pretty sure, it will not fix itself between 2 checks...

          Comment

          Working...