Ad Widget

Collapse

false positive 'net.tcp.service [ssh]' when the CPU is busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • flako
    Member
    • Sep 2011
    • 40

    #1

    false positive 'net.tcp.service [ssh]' when the CPU is busy

    Hello
    I'm trying to remove the false positives of a trigger that indicates whether the sshd service is online.

    I have
    Zabbix 1.8.7 on SLES 11 SP1 (kernel 2.6.32.12-0.7-xen)
    mysql-5.0.67-13.20.1


    It seems that the problem is in the item net.tcp.service [ssh], this returns '0 'if the CPU is heavily loaded (cpu idle = 0), so with net.tcp.service.perf [ssh].
    From the tests I've done, the ssh service responds. (just slow).
    I tried to put more conditions to triger that can generate false positives, but I have not been solved.
    The link https://support.zabbix.com/browse/ZBX-5398 https://www.zabbix.com/forum/showthr...light=ssh+load and are related to this.

    Now I'm going to test using the variable 'system.cpu.util' in the triger ...
    But I wonder if this is a bug that was fixed in a new version or is there any elegant way to solve it.

    Thanks for reading. and sorry for using a translate
  • heaje
    Senior Member
    Zabbix Certified Specialist
    • Sep 2009
    • 325

    #2
    I'd create a trigger that looks something like this for this particular issue:

    {Server:net.tcp.service[ssh].max(600)}<1

    Assuming that you're OK with a slight delay in being alerted to an actual problem, this trigger would only go off if the maximum value for the "net.tcp.service[ssh]" item in the last 10 minutes is less than one. The benefit here is that if your item has an interval less than 5 minutes (say, around 3-4 minutes), then the service would have to fail at least 2 times in a row for the alert to go off.

    Comment

    • heaje
      Senior Member
      Zabbix Certified Specialist
      • Sep 2009
      • 325

      #3
      I just thought of a better way that doesn't rely on whatever the check interval is:

      {Server:net.tcp.service[ssh].max(#3)}<1

      This would only set off an alert if the highest value for the last 3 collected values is less than 1.

      Comment

      • flako
        Member
        • Sep 2011
        • 40

        #4
        Hello heaje
        Use the max function does not seem to be sufficient, in the event that service [ssh] return '1 1 0 0 1 0 0 0 0 0 1 '(0: down, 1: up) where the five '0' happen for 600 seconds, the trig ger runs. (assuming the cpu is at full for 600 seconds). Similarly happens max (# 3)
        I tried the same idea as you suggest with the function 'count' and sadly discovered that does not work
        Now I'm testing the expression 'net.tcp.service[ssh].Last(0) = 0 & system.cpu.util[,idle,avg1].Last(0)> 20'
        It seems that service[ssh] when cpu.idle is less than 15% returns '0'.
        control 'cpu.idle> 20' it seems to work, I have to put into production and see how it behaves.

        What do you think? worked in all cases/desktop?

        Comment

        • heaje
          Senior Member
          Zabbix Certified Specialist
          • Sep 2009
          • 325

          #5
          If what you put in works for you, I'd call it good .

          Comment

          Working...