Ad Widget

Collapse

False positives

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ulises
    Junior Member
    • Oct 2008
    • 13

    #1

    False positives

    Hi

    I;m monitoring a lot of servers running many services and everyday I'm getting false positives --- they are from random services like ssh is down or web is down or even the server is down. after a minute a recovery alert comes in

    any way to delay the check time? or any way to fix this problem?

    thx in advance
  • Calimero
    Senior Member
    • Nov 2006
    • 481

    #2
    Originally posted by ulises
    Hi

    I;m monitoring a lot of servers running many services and everyday I'm getting false positives --- they are from random services like ssh is down or web is down or even the server is down. after a minute a recovery alert comes in

    any way to delay the check time? or any way to fix this problem?

    thx in advance
    Use aggregate functions like min/max/avg/count instead of last().

    For example (ssh simple check: 0 down, 1 OK, 2 unreachable):

    {my_host:ssh.count(#3,1)}=0

    ==> Over the last three checks, how many times I got "1". If 0 ==> trigger goes On.

    The #X syntax is new to zabbix 1.6 and was not available in 1.4. Only .count(600,1) is working with 600 being the number of seconds.

    With checks that return 0 or 1 (and not 1 or 0 / 2) you can use min/max instead of count().

    Comment

    • ulises
      Junior Member
      • Oct 2008
      • 13

      #3
      thx for the reply and the idea

      what I did was

      create a new trigger using this function

      {a.server.com:net.tcp.service[http].min(120)}=0

      then disabled the regular trigger (the one use last()) and the new trigger sent the alert but is not waiting for the 120 secs

      I tested it stopping apache for 60 secs and the alert was sent at 55 secs

      any ideas about this?

      Comment

      • Calimero
        Senior Member
        • Nov 2006
        • 481

        #4
        You should use .max() instead of min().

        11:45:00 / Value=1
        11:46:00 / Value=0
        11:47:00 / Value=0

        At 11:46:05
        - min(120) = 0
        - max(120) = 1
        At that time, apache failed to respond once. Using max, you will not alert yet but using min() will already cause the trigger to go ON.

        At 11:47:05
        - min(120) = 0
        - max(120) = 0
        At that time, apache has failed twice. Now even with max(120) your trigger will go ON.

        Another case is this (small interruption but you don't want to be bothered).
        11:45:00 / Value=1
        11:46:00 / Value=0
        11:47:00 / Value=1
        Using max(120) or max(#2) it will return 1 in spite of the single error.

        This works for items returning 1 or 0 (as is the case with net.tcp.service). Be careful with min/max on simple checks as some of them return 0 or 2 for errors and 1 for OK.

        Comment

        • ulises
          Junior Member
          • Oct 2008
          • 13

          #5
          I think that did it... thank you so much for your help and explanation

          Comment

          • vinny
            Senior Member
            • Jan 2008
            • 145

            #6
            Since 1.6.2, last(#X) is supported.
            The simpler is so : aaaa.last(#2)=0 for the last 2 checks...

            vinny
            -------
            Zabbix 1.8.3, 1200+ Hosts, 40 000+ Items...zabbix's everywhere

            Comment

            • Calimero
              Senior Member
              • Nov 2006
              • 481

              #7
              From what I understand from the manual (I haven't moved to 1.6.2 yet), last(#N) allows you to evaluate the Nth previous value.

              15:41:00 Value=42
              15:42:00 Value=56
              15:43:00 Value=45
              15:44:00 Value=31

              If the trigger is evaluated at 15:44:30, then - from what I understand - you'll have the following:
              .last(0) or .last(#1) evaluates to 31
              .last(#2) evaluates to 45
              .last(#3) evaluates to 56

              So this is different from "last N values == X". But maybe the manual isn't clear enough and I'm mistaken.

              Comment

              Working...