Ad Widget

Collapse

Avoiding false positives - what is the correct and working expression for that?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mainframe
    Junior Member
    • Dec 2007
    • 7

    #1

    Avoiding false positives - what is the correct and working expression for that?

    I have been reading documentation and forums posts for that a lot - still no working simple and understandable solution. Zabbix often makes complicated tasks easy but some really basic and simple tasks are really complicated to implement (as it was ping host status monitoring with earlier releases) - this seems also apply to this one.

    Currently we have found the following expression for avoiding host status false positives - requiring multiple host unavailible reports in a row before triggering host unavailible trigger event:

    {Template_linagent_base:status.count( #3,0 ) } <1

    As I understand documentation and forum posts this expression should say that:
    IF within last 3 values 'host status OK' value (which is 0) appears less than 1 time - then we have a problem (as host unavailible = 2) and therefor we trigger an event and possible action.

    But testing this one out shows that this expression works not as intended (when host unavailible status change arrives it activates trigger right away - but with OK response - not as PROBLEM) - could anybody explain what Im doing wrong here? Im kind a stuck here and Im starting thinking that it might be a bug in the software.

    What I want to achieve is an trigger expression where status value has to fail repeatedly in a row (say 3 times in a row) before it triggers event - how to make such a solid expression?
  • mainframe
    Junior Member
    • Dec 2007
    • 7

    #2
    We tried also this expression suggested in another forum post by Alex:

    {Template_linagent_base:status.min( 180 )} = 2

    We saw it activating the trigger right-away after host status change once - and after status recovery it pretty much stopped working - when we shut dowb zabbix agent againg on test host - no triggering accours at all anymore. Really wierd.

    Comment

    • mainframe
      Junior Member
      • Dec 2007
      • 7

      #3
      Tried also the following expression:

      {Template_linagent_base:status.avg( #2 )} = 2

      Still no luck - it just won't trigger.

      Looking on 'Latest data' screen it reveals that when host status changes to 'Unavailible' zabbix will stop updating host status until its availible again I guess. That makes these expressions unvalid - because we are expecting multiple failures count but that does not happen as zabbix stops host status unavailible logging after first failure. But host status item says it should update data every 60 seconds.

      So how can we make expression to avoid host status false positives if zabbix stops updating status value after first failure - rendering this expressions above useless?

      Comment

      • mainframe
        Junior Member
        • Dec 2007
        • 7

        #4
        Ok - we tried another approach with host status expression - trigger when for 5 minutes there has been host status value 2 (host unavailible).

        {Template_linagent_base:status. avg( 300 ) } = 2

        Doing the test:
        1) shutting zabbix_agentd down on testhost at 16:37:xx (host status item update is set 60s)
        2) lates data shows Host status 20 May 16:38:43 Unreachable (2) +2
        3) time is now 16:49 - and still no trigger event ??? Its well beyond 5 minutes - so it should trigger. Is it some kind of zabbix bug? Zabbix version is 1.6.2.

        I really dont get it anymore.

        Comment

        • Calimero
          Senior Member
          • Nov 2006
          • 481

          #5
          host.status is a special "virtual" item. It's more of an indicator that zabbix_server has been unable to poll data (ping, snmp, agent).

          It's not updated every X seconds, so the .count() function won't work well.

          Personally I prefer so .min()/.max() on agent.ping if you use the agent, maybe some icmpping...
          Or .nodata() on some core SNMP items if the device is only SNMP aware.

          Comment

          • mainframe
            Junior Member
            • Dec 2007
            • 7

            #6
            Ok - found a working solution to avoid host.status false positives.
            Here is the expression that seems to do the job - if host goes down and it is not up within 5 minutes then trigger fires.

            ({Template_linagent_base:status. last( 0 ) } = 2) & ({Template_linagent_base:status. nodata( 300 ) } = 1)

            For other things like ping etc - I think there we could use examples above with count and stuff...
            Last edited by mainframe; 20-05-2009, 16:20. Reason: Added a comment

            Comment

            • ahahum
              Member
              • Jan 2009
              • 79

              #7
              Recovery Messages

              Originally posted by mainframe
              {Template_linagent_base:status.count( #3,0 ) } <1
              I am using this expression above on my icmpping checks for network devices. It works perfectly for avoiding false positives, but I get a recovery message as soon as 1 ping replies. Is there a way to make it trigger for 3 down and not send a recovery message until I get 3 replies?

              Thanks in advance!

              Comment

              Working...