Ad Widget

Collapse

Zabbix trigger are flapping hysteresis dont't work.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • miguel sauaia
    Junior Member
    • Sep 2011
    • 13

    #1

    Zabbix trigger are flapping hysteresis dont't work.

    Helo!


    Can somebody help me? My trigger hysteresis not working. I have one trigger how below. I am using macros on the triggers.

    Code:
    (({TRIGGER.VALUE}=0) & ({Template OS Linux:proc.num[].last(600)} > {$CPU_LOAD})) |  (({TRIGGER.VALUE}=1) & ({Template OS Linux:proc.num[].last(600)} < {$CPU_LOAD}))
    This is a trigger result in the events tab.

    Code:
    (({TRIGGER.VALUE}=0) & ({SERVER2:proc.num[].last(600)} > 400)) | (({TRIGGER.VALUE}=1) & ({SERVER2:proc.num[].last(600)} < 400))
    Events:
    Attached Files
    Last edited by tchjts1; 09-10-2013, 19:54.
  • Pada
    Senior Member
    • Apr 2012
    • 236

    #2
    Hi,

    You have quite a few problems with your trigger expression.
    1. .last(600) evaluates a single data point 10 minutes ago.
    2. proc.num[] isn't CPU LOAD, but it may be related
    3. remove the first {TRIGGER.VALUE}=0, because that would cause the flapping
    4. swap your single CPU_LOAD macro for CPU_LOAD_HIGH and CPU_LOAD_NORMAL macros, so that you don't immediately release the trigger


    For example our trigger for high CPU usage is something like the following:
    Code:
    {host:system.cpu.loadpercentagepercpu.avg5.min(3m)}>95|({TRIGGER.VALUE}=1 & {host:system.cpu.loadpercentagecpu.avg5.max(5m)}>60)
    Which means that it will only trigger when the CPU load per core was higher than 95% at every data point for the last 3 minutes.
    The trigger would only go into an OK state once the CPU load per core is 60% or lower for every single data point for the last 5 minutes. Please take note that the "system.cpu.loadpercentagecpu.avg5" item is a calculated one that we created from:
    Code:
    last("system.cpu.load[,avg5]")/last("system.cpu.num")
    I hope this helps!

    Comment

    • miguel sauaia
      Junior Member
      • Sep 2011
      • 13

      #3
      Originally posted by Pada
      Hi,

      You have quite a few problems with your trigger expression.
      1. .last(600) evaluates a single data point 10 minutes ago.
      2. proc.num[] isn't CPU LOAD, but it may be related
      3. remove the first {TRIGGER.VALUE}=0, because that would cause the flapping
      4. swap your single CPU_LOAD macro for CPU_LOAD_HIGH and CPU_LOAD_NORMAL macros, so that you don't immediately release the trigger


      For example our trigger for high CPU usage is something like the following:
      Code:
      {host:system.cpu.loadpercentagepercpu.avg5.min(3m)}>95|({TRIGGER.VALUE}=1 & {host:system.cpu.loadpercentagecpu.avg5.max(5m)}>60)
      Which means that it will only trigger when the CPU load per core was higher than 95% at every data point for the last 3 minutes.
      The trigger would only go into an OK state once the CPU load per core is 60% or lower for every single data point for the last 5 minutes. Please take note that the "system.cpu.loadpercentagecpu.avg5" item is a calculated one that we created from:
      Code:
      last("system.cpu.load[,avg5]")/last("system.cpu.num")
      I hope this helps!

      Tank you Pada!

      I have to change this, but "avg.max" don't goes evaluate the maximum value of average of 5 minutes? I want if during 5 minutes the host exceed the threshold "full time", then this will be triggered. There could be Something type "system.cpu.load.last(#5)".

      Comment

      • Pada
        Senior Member
        • Apr 2012
        • 236

        #4
        Look, there are a whole bunch of ways that you can obtain the average values, and even more ways that you can calculate the maximum of those averages.

        For instance, Linux exposes 3 average values of the Processor Load: over 1 minute, over 5 minutes and over 15 minutes:
        Code:
        system.cpu.load[,avg1]
        system.cpu.load[,avg5]
        system.cpu.load[,avg15]
        The average over 1 minute (avg1) will go much more up and down than the average over 15 minutes (avg15).
        In our use case, we're not interested in seeing/triggering on momentary spikes on CPU usage, like what avg1 gives you.

        In Zabbix you can then fetch those average values at whatever interval you'd like.
        Like it would be pointless in monitoring [,avg5] once every 5 minutes and then taking the [,avg5].max(5m), because it would be the same as [,av5].last(0).
        It would be more sensible ot monitor [,avg5] every 1 minute and then taking [,avg5].max(5m).

        Unfortunately I cannot give you more info than this, because setting up the monitoring and triggers all depend on what you want to see and what you want to get notified on.
        I suppose a good starting point would be to monitor [,avg5] every 60 seconds and then triggering on [,avg5].last(0) > {YOUR THRESHOLD}

        Comment

        • miguel sauaia
          Junior Member
          • Sep 2011
          • 13

          #5
          Originally posted by Pada
          Look, there are a whole bunch of ways that you can obtain the average values, and even more ways that you can calculate the maximum of those averages.

          For instance, Linux exposes 3 average values of the Processor Load: over 1 minute, over 5 minutes and over 15 minutes:
          Code:
          system.cpu.load[,avg1]
          system.cpu.load[,avg5]
          system.cpu.load[,avg15]
          The average over 1 minute (avg1) will go much more up and down than the average over 15 minutes (avg15).
          In our use case, we're not interested in seeing/triggering on momentary spikes on CPU usage, like what avg1 gives you.

          In Zabbix you can then fetch those average values at whatever interval you'd like.
          Like it would be pointless in monitoring [,avg5] once every 5 minutes and then taking the [,avg5].max(5m), because it would be the same as [,av5].last(0).
          It would be more sensible ot monitor [,avg5] every 1 minute and then taking [,avg5].max(5m).

          Unfortunately I cannot give you more info than this, because setting up the monitoring and triggers all depend on what you want to see and what you want to get notified on.
          I suppose a good starting point would be to monitor [,avg5] every 60 seconds and then triggering on [,avg5].last(0) > {YOUR THRESHOLD}
          Thank you Pada!


          I set the trigger following their recommendation and it is working well.

          Comment

          • troffasky
            Senior Member
            • Jul 2008
            • 567

            #6
            I have been through the same process as the OP and it seems we've both followed this:

            Zabbix trigger expressions provide an incredibly flexible way of defining problem conditions. If you can express your problem using plain English or any other human language, there is a great chance it could be represented using triggers. I’ve noticed that even experienced Zabbix users are not always aware of the true power of triggers. The […]


            When I used the example for "CPU load is too high" it flaps at every poll interval. Tweaked it to use Pada's example and it's fine.

            Comment

            Working...