Ad Widget

Collapse

Can I achive an exponentially-weighted moving average in a recovery expression?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • troffasky
    Senior Member
    • Jul 2008
    • 565

    #1

    Can I achive an exponentially-weighted moving average in a recovery expression?

    Let's get this out of the way first: I'm not a mathematician
    I see a variety of functions allowed in recovery expressions. For ICMP availability I am using:

    avg(/ICMP Ping/icmpping,60m)>0.9

    This is so that when something is flapping, the trigger will stay in PROBLEM state until it really has recovered. Unfortunately this means that when something has been hard-down for >1hr and then eventually comes back, the trigger will take 90% of one hour to clear. I think that an exponentially-weighted moving average function would help here - more recent pings in the window would be weighted higher than less recent, so a trigger would clear more quickly after hard-down event if all is OK, but still won't flap if the host is up and down.

    So, how can I achieve this with existing trigger functions?
  • troffasky
    Senior Member
    • Jul 2008
    • 565

    #2
    I think I can achieve something close with this expression:

    avg(/ICMP Ping/icmpping,60m)>0.5 and
    avg(/ICMP Ping/icmpping,30m)>0.75 and
    avg(/ICMP Ping/icmpping,10m)>0.9​

    Can anyone sanity-check this? I think that the more intervals I break it into, the more it looks like a curve, but three is probably enough.
    I wish there was a simulator to run triggers on, instead of having to wait for something flappy.

    Comment

    • kyus
      Senior Member
      • Feb 2024
      • 171

      #3
      Since you are using and, the trigger will still take 30 minutes to resolve, but I guess the ideia in making it like that is so you're sure that the host is currently up. If that's the point, I believe that it is fine like that.

      In order to test it, you could create a trapper item and send 0 and 1 to it as you wish, may be boring and a bit time consuming, but it's a way to test it.

      Comment

      • troffasky
        Senior Member
        • Jul 2008
        • 565

        #4
        Originally posted by kyus
        Since you are using and, the trigger will still take 30 minutes to resolve,
        Compare to my first post, the less complex version was 90% over 60 minutes, not 50% over 60 minutes. In a "recovering from a hard-down" scenario, my hope is that the more complex recovery expression recovers after 30 minutes, not 54 minutes as the simpler expression would.

        Good tip about the trapper item. I could easily create a bash script that sends the same synthetic up/down events every time I run it, in order to fine-tune the recovery expression.

        Comment

        • kyus
          Senior Member
          • Feb 2024
          • 171

          #5
          Originally posted by troffasky
          avg(/ICMP Ping/icmpping,60m)>0.5 and
          avg(/ICMP Ping/icmpping,30m)>0.75 and
          avg(/ICMP Ping/icmpping,10m)>0.9
          I believe that this expression will work as you expect, recovering after 30 minutes

          Comment

          Working...