Ad Widget

Collapse

False alarms every day

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fernandomm
    Junior Member
    • Dec 2009
    • 7

    #1

    False alarms every day

    Hello,

    I'm having issues with zabbix giving false alarms every day. Truth be told, on the few months that i used it, it never worked without false alarms.

    Scenario 1:

    Randomly i get a web monitoring alert. Checking zabbix item, for some reason, it has status code 0 while my trigger checks if status is different than 200.

    If, at the same exact time that i receive this notification, i try to telnet port 80 of the monitored server, it works ok. I'm pretty sure that the server is not down.

    Also, i use pingdom service for this same server, and never got a single false alarm from it.

    Note that zabbix server is at the same network as the monitored server, so there is no reason why it would complain about server being down while pingdom ( from another network ) can access the server normally.

    Scenario 2

    Same as the Scenario 1, but with server is unreachable trigger. I came to an extreme situation of keeping a ssh session open at zabbix server and, as soon as i got an email, trying to telnet zabbix client port on problematic server ( and it was working! ).

    Do you also have this kind of problem? How to stop false alarms?

    Thanks!
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #2
    There are many reasons you could be getting false positives. Here are some worth investigating.

    1. Your timeout setting in the zabbix_server.conf or zabbix_agentd.conf is too low. If it takes longer than the timeout value set, you'll get a failure sent back.

    2. You are triggering too aggressively. You could make your triggers require 2 or more consecutive fail values before triggering, for example, or set a percentage-type value where if there are more than n failures in timeperiod t, then trigger. If you are triggering on every failure of a frequently polled item there will be a higher number of false positives.

    3. Web tests are notoriously hard to consistently measure. Sometimes under load a server will burp and you'll have the connection drop or fail, resulting in a 0. See #2.

    Comment

    Working...