Ad Widget

Collapse

How to (virtually) eliminate false positives

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • illumin8
    Member
    • Jun 2005
    • 36

    #1

    How to (virtually) eliminate false positives

    I've been using Zabbix to do some monitoring of our remote websites across an internet connection, and I've noticed that you can definitely get a lot of false positives, especially if you're checking often. If you think about it; most internet connections have about 2-3% packet loss. If you're checking once a minute, this means you're pretty much bound to get false positives every few hours or so.

    This is the solution I implemented, using the wonderful trigger language that Zabbix supports:

    Make your triggers check not only the current value, but also the previous value stored as well. Here is a sample trigger:

    ({website.com:http,80.last(0)}<1)&({website.com:ht tp,80.prev(0)}<1)

    The only side effect of doing this is that the Zabbix server now has to fail on two checks of the service, instead of just one, meaning if you're checking every minute, you won't know it's down until 2 minutes later. This is a small price to pay for an uninterrupted nights sleep...

    Let me know if this works for you, or if you have any other cool triggers that you'd like to share.
  • dantheman
    Senior Member
    • May 2006
    • 209

    #2
    I actually have already set that up to monitor remote routers, and it's helped quite a bit. Every now and then I still get a false positive, but much less than it was.

    Comment

    • illumin8
      Member
      • Jun 2005
      • 36

      #3
      Originally posted by dantheman
      I actually have already set that up to monitor remote routers, and it's helped quite a bit. Every now and then I still get a false positive, but much less than it was.
      I would think that if it's failed two checks, two minutes in a row, it's no longer a "false" positive...

      Comment

      • sauron
        Senior Member
        • Jan 2005
        • 215

        #4
        Originally posted by illumin8
        ({website.com:http,80.last(0)}<1)&({website.com:ht tp,80.prev(0)}<1)
        It's more simple. For five minutes :

        {website.com:http,80.max(300)}<1

        Comment

        • illumin8
          Member
          • Jun 2005
          • 36

          #5
          Originally posted by sauron
          It's more simple. For five minutes :

          {website.com:http,80.max(300)}<1
          I used to have things implemented that way at my site, until one Saturday we had a particularly annoying web application that would time out for 2 minutes out of every 10 minutes, and it never triggered an alert. After that I got a little more paranoid and didn't want it to have to fail for 5 minutes in a row before getting an alert.

          Comment

          Working...