If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to REGISTER before you can post. To start viewing messages, select the forum that you want to visit from the selection below.
How would I add a time constraint to the MSSQL service availability alert? We have backups that run nightly and cause the service to become unavailable for anywhere between 30 seconds and 5 minutes and we end up getting alerted nightly for something that recovers on its own. So I'd like to only be alerted if the service is unavailable for 10 minutes.
Answer selected by techmattr at 12-10-2022, 20:39.
Hamardaban, thanks the first expression you posted worked exactly how I wanted. Could you explain how the two expressions you posted would behave differently? It isn't immediately obvious to me how the second expression you posted would work.
Instead of "last" try "sum".
Expression Description
sum(/host/key,10m) Sum of values in the last 10 minutes.
sum(/host/key,#10) Sum of the last ten values.
So something like this: sum(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m) =0
This expression will trigger only when for 10 consecutive minutes your DB will be down (is running and accepting TCP connections) .
Count expression counts a number of values within the defined evaluation period. count (/host/key,(sec|#num)<:time shift>,<operator>,<pattern>)
Supported operators: eq - equal (default) ne - not equal gt - greater ge - greater or equal lt - less le - less or equal like - matches if contains pattern (case-sensitive) bitand - bitwise AND regexp - case-sensitive match of the regular expression given in pattern iregexp - case-insensitive match of the regular expression given in pattern
pattern (optional) - required pattern (string arguments must be double-quoted)
count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"gt",0) =0
This would trigger when for the last 10 minutes a number of values greater than 0 equal 0.
count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"eq",1) =0
This would trigger when for the last 10 minutes a number of values equal to 1 is 0.
Parameters with a hashtag have a different meaning with the function last - they denote the Nth previous value, so given the values 3, 7, 2, 6, 5 (from the most recent to the least recent):
last(/host/key,#2) would return '7'
last(/host/key,#5) would return '5'
Hamardaban, thanks the first expression you posted worked exactly how I wanted. Could you explain how the two expressions you posted would behave differently? It isn't immediately obvious to me how the second expression you posted would work.
just sticking my nose in here..
"count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"gt",0) =0"
Lets count all values greater than 0 from last ten minutes and if result is 0, then we fire the trigger. This means, that only 0-s were found for 10 minutes, meaning your service is down for all that period. (net.tcp.service -> 0 - service is down, 1 - service is running)
"count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"eq",1) =0"
Lets count all values equal to 1 from last ten minutes and if result is 0 (none found), then we fire the trigger. This means that no 1-s were found for 10 minutes, meaning your service is down for all that period.
Comment