Let's get this out of the way first: I'm not a mathematician 
I see a variety of functions allowed in recovery expressions. For ICMP availability I am using:
avg(/ICMP Ping/icmpping,60m)>0.9
This is so that when something is flapping, the trigger will stay in PROBLEM state until it really has recovered. Unfortunately this means that when something has been hard-down for >1hr and then eventually comes back, the trigger will take 90% of one hour to clear. I think that an exponentially-weighted moving average function would help here - more recent pings in the window would be weighted higher than less recent, so a trigger would clear more quickly after hard-down event if all is OK, but still won't flap if the host is up and down.
So, how can I achieve this with existing trigger functions?

I see a variety of functions allowed in recovery expressions. For ICMP availability I am using:
avg(/ICMP Ping/icmpping,60m)>0.9
This is so that when something is flapping, the trigger will stay in PROBLEM state until it really has recovered. Unfortunately this means that when something has been hard-down for >1hr and then eventually comes back, the trigger will take 90% of one hour to clear. I think that an exponentially-weighted moving average function would help here - more recent pings in the window would be weighted higher than less recent, so a trigger would clear more quickly after hard-down event if all is OK, but still won't flap if the host is up and down.
So, how can I achieve this with existing trigger functions?
Comment