Ad Widget

Collapse

false positives and handling...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • toddblake
    Junior Member
    Zabbix Certified Specialist
    • Jul 2011
    • 27

    #1

    false positives and handling...

    To reduce false positives surrounding blips and spikes, I'm thinking of the best way to implement triple checks on most of our triggers & notifications.

    I've thought of two ways to do this, and just looking for feedback/input on how others do the same thing. Keep in mind I've done monitoring in the past but am new to Zabbix so while I get concepts, I may now know the right terminology or what a good way to do things in Zabbix is.

    a) Have triggers logged and displayed on the console, but not send out e-mail notifications unless you get three consecutive trigger failures. The trigger log would then show us when things became a problem but wouldn’t spam our e-mail. Admittedly not sure how to do this. I initiallyl thought escalations that had their timing lined up with the item checks, but not sure if this is the wisest way since there's no way to be sure the times stay in sync if someone changes item timing or escalation timing.

    OR

    b) Not have anything triggered on the console or logged as a failure unless we’ve failed three times in a row. A report would then have to be written to look for small blips or false positives along the way against the individual items. Thinking a complex trigger expression checking that the item has had the same "issue" using .last(0), .last(1) and .last(2) to check the last three values that were recorded.
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    There are number of effective ways for reducing false positives:

    1. Have less sensitive triggers: cpuload.min(600)>5 instead of cpuload.last()>5
    2. Use of hysteresis, i.e. different conditions for problem and recovery
    3. Use of delayed notifications

    Start with (1), then try (2) and use (3) if absolutely required. You don't want to be notified one hour after a disaster happened do you?
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    Working...