Ad Widget

Collapse

If MSSQL Service is Down for 10 Minutes

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • techmattr
    Member
    • Sep 2022
    • 39

    #1

    If MSSQL Service is Down for 10 Minutes

    Code:
    last(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}])=0
    How would I add a time constraint to the MSSQL service availability alert? We have backups that run nightly and cause the service to become unavailable for anywhere between 30 seconds and 5 minutes and we end up getting alerted nightly for something that recovers on its own. So I'd like to only be alerted if the service is unavailable for 10 minutes.
  • Answer selected by techmattr at 12-10-2022, 20:39.
    Hamardaban
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • May 2019
    • 2713

    count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"gt",0) =0

    count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"eq",1) =0​

    Comment

    • Hamardaban
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • May 2019
      • 2713

      #2
      count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"gt",0) =0

      count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"eq",1) =0​

      Comment

      • techmattr
        Member
        • Sep 2022
        • 39

        #3
        Hamardaban, thanks the first expression you posted worked exactly how I wanted. Could you explain how the two expressions you posted would behave differently? It isn't immediately obvious to me how the second expression you posted would work.

        Comment

        • Bartosz Mickiewicz
          Junior Member
          • Oct 2022
          • 27

          #4
          Instead of "last" try "sum".
          Expression Description
          sum(/host/key,10m) Sum of values in the last 10 minutes.
          sum(/host/key,#10) Sum of the last ten values.

          So something like this:
          sum(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m) =0
          This expression will trigger only when for 10 consecutive minutes your DB will be down (is running and accepting TCP connections) .​


          Count expression counts a number of values within the defined evaluation period.
          count (/host/key,(sec|#num)<:time shift>,<operator>,<pattern>)​​
          Supported operators:
          eq - equal (default)
          ne - not equal
          gt - greater
          ge - greater or equal
          lt - less
          le - less or equal
          like - matches if contains pattern (case-sensitive)
          bitand - bitwise AND
          regexp - case-sensitive match of the regular expression given in pattern
          iregexp - case-insensitive match of the regular expression given in pattern

          pattern (optional) - required pattern (string arguments must be double-quoted)​

          count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"gt",0) =0
          This would trigger when for the last 10 minutes a number of values greater than 0 equal 0.

          count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"eq",1) =0​​
          This would trigger when for the last 10 minutes a number of values equal to 1 is 0.


          Parameters with a hashtag have a different meaning with the function last - they denote the Nth previous value, so given the values 3, 7, 2, 6, 5 (from the most recent to the least recent):

          last(/host/key,#2) would return '7'
          last(/host/key,#5) would return '5'

          Comment

          • cyber
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Dec 2006
            • 4807

            #5
            Originally posted by techmattr
            Hamardaban, thanks the first expression you posted worked exactly how I wanted. Could you explain how the two expressions you posted would behave differently? It isn't immediately obvious to me how the second expression you posted would work.
            just sticking my nose in here..

            "count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"gt",0) =0"
            Lets count all values greater than 0 from last ten minutes and if result is 0, then we fire the trigger. This means, that only 0-s were found for 10 minutes, meaning your service is down for all that period. (net.tcp.service -> 0 - service is down, 1 - service is running)

            "count(/SERVERNAME/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}],10m,"eq",1) =0​"
            Lets count all values equal to 1 from last ten minutes and if result is 0 (none found), then we fire the trigger. This means that no 1-s were found for 10 minutes, meaning your service is down for all that period.

            Comment

            Working...