Ad Widget

Collapse

ICMP PING ISSUE (false positive): Strange Behavior

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • francescoRo
    Junior Member
    • Mar 2014
    • 25

    #1

    ICMP PING ISSUE (false positive): Strange Behavior

    Hi All,

    we have a strange behavior with some hosts monitored. Zabbix is telling that these hosts are unreacheable, but if we try the ping from the frontend, the hosts are all available. The hosts are also available from command line.



    The thing is also more strange because we put this ICMP Ping template on 50 hosts, ma only these are always unreacheable (false positive).

    This is our zabbix's load:


    Do you have any idea about it?

    Regards.

    PS. I put some image, screenshot in tehe post, but seem that they are not visible..
    Last edited by francescoRo; 27-06-2014, 16:49.
  • pc99096
    Senior Member
    • Oct 2011
    • 193

    #2
    check your queue:

    Comment

    • francescoRo
      Junior Member
      • Mar 2014
      • 25

      #3
      Thanks for the answer, but maybe you did not understand our needs. We know perfectly well where to check the queues.
      We need suggestions for solving the problem.

      You can delete your post

      Best Regards

      Comment

      • pc99096
        Senior Member
        • Oct 2011
        • 193

        #4
        ok and what does the queue say? is it 0?
        we had a similar problem, increasing StartPingers in the config did the trick.
        or check your latest data for those problematic hosts, is the ping value 0? what are the real intervals between checks in latest data?

        Comment

        • francescoRo
          Junior Member
          • Mar 2014
          • 25

          #5
          Hi, thank you very much for the support.

          We would like to attach our screenshot of the Administration -- > Queue, but is not possible. Anyway...
          This is (NOW) our queue for simple check:
          (5s) (10s) (30s)
          Simple check 17 141 257 0 0 0

          We also tried differents values (between 10 and 100) for the StartPinger without solving this issue. Now the value is:
          StartPinger=60

          The hosts that to be monitored are about 300, but about 50 with this template.

          These false positive, there is always the usual 8 hosts .... We do not understand why
          Maybe it's the architecture that provides a Virtual IP? We have a zabbix cluster with node1, node2 and a VIP. We do not know what to think anymore ...

          After many test, and a lot of time lost we are thinking that zabbix is not a good monitoring platform. It isn't stable and reliable... but unfortunately, now, it's too late to change platform.

          Best Regards.

          Comment

          • pc99096
            Senior Member
            • Oct 2011
            • 193

            #6
            again,
            check the real interval between checks on those servers which have problems:
            https://www.zabbix.com/documentation...ng/latest_data --> "500 latest values"
            if the interval between pings is bigger than the defined interval in the template - problem.

            really try to check your queue screen - increase values in rows which are not 0.

            Comment

            • francescoRo
              Junior Member
              • Mar 2014
              • 25

              #7
              Hi,

              now we are watching an host from those who have this problem; we would like to put at your attention the last 15 values got from the "500 latest values":

              Timestamp Value
              2014.Jun.30 12:00:43 Down (0)
              2014.Jun.30 11:59:44 Down (0)
              2014.Jun.30 11:58:43 Down (0)
              2014.Jun.30 11:57:44 Down (0)
              2014.Jun.30 11:56:43 Down (0)
              2014.Jun.30 11:54:43 Down (0)
              2014.Jun.30 11:53:44 Down (0)
              2014.Jun.30 11:52:43 Down (0)
              2014.Jun.30 11:51:43 Down (0)
              2014.Jun.30 11:50:43 Down (0)
              2014.Jun.30 11:49:43 Down (0)
              2014.Jun.30 11:48:43 Down (0)
              2014.Jun.30 11:47:44 Down (0)
              2014.Jun.30 11:46:43 Down (0)
              2014.Jun.30 11:45:43 Down (0)

              If we are not reading bad this values, it means that the checks are made every 60 sec (more or less). Right?
              How can we increase the value in rows that are different from 0 (in queue screen)?

              Best Regards.

              Comment

              • tchjts1
                Senior Member
                • May 2008
                • 1605

                #8
                Originally posted by francescoRo
                Thanks for the answer, but maybe you did not understand our needs. We know perfectly well where to check the queues.
                We need suggestions for solving the problem.

                You can delete your post

                Best Regards
                You know, it's not often that you see someone new come into these forums and ask for help, and when someone starts going through trouble-shooting steps with them, the poster gives snarky replies.

                pc99096 - You have the patience of a saint. Good luck.

                Comment

                • francescoRo
                  Junior Member
                  • Mar 2014
                  • 25

                  #9
                  Oh yes, thou hast are right. Thanks for the helpful reply.

                  Anyway...
                  We solved by an external script that uses the FPING. After we have created an ITEM and a Trigger.

                  if someone wants to get the script and the template will be happy to share them.

                  Best regards.

                  Comment

                  • jan.garaj
                    Senior Member
                    Zabbix Certified Specialist
                    • Jan 2010
                    • 506

                    #10
                    Originally posted by francescoRo
                    After many test, and a lot of time lost we are thinking that zabbix is not a good monitoring platform. It isn't stable and reliable...
                    Could you be more specific?
                    Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                    My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                    Comment

                    • terrydlm
                      Junior Member
                      • Feb 2014
                      • 11

                      #11
                      Me too!

                      I am also getting this problem. My Zabbix Proxy has been monitoring the same servers for months without issue and now after doing an upgrade to the latest version I get ALL servers attached to 1 particular proxy are down according to an icmpping test HOWEVER if I ping from the command line the ping repsonds fine. There are no errors in the error log.

                      I assume this is a bug with the latest version of Zabbix?

                      Comment

                      • francescoRo
                        Junior Member
                        • Mar 2014
                        • 25

                        #12
                        Zabbix is unstable and unreliable because of this problem.
                        Why some hosts (always the same) give false positives on ICMP Ping?
                        Anyway, I solved this issue with an external check.
                        If can be usefull for someone, I can share it...

                        Thank at all for the support.

                        Comment

                        • joergi
                          Member
                          • Jul 2013
                          • 32

                          #13
                          Originally posted by francescoRo
                          If can be usefull for someone, I can share it...
                          Hello,

                          could you share your script pls?

                          Regards,
                          Jörg

                          Comment

                          • francescoRo
                            Junior Member
                            • Mar 2014
                            • 25

                            #14
                            Hi, this is my script:

                            #!/bin/bash

                            # Exit status are:
                            # 0 if all the hosts are reachable
                            # 1 if some hosts were unreachable
                            # 2 if any IP addresses were not found
                            # 3 for invalid command line arguments
                            # 4 for a system call failure.

                            howManyTime=$1
                            hostIP=$2
                            fping -c $howManyTime $hostIP &> /dev/null
                            if [ $? -eq 0 ]; then
                            echo 1
                            else
                            echo 0
                            fi

                            Zabbix item: ping.sh[2, {HOST.CONN}]

                            Zabbix Trigger: {External Checking.sh[2, {HOST.CONN}].avg(#5)}=0

                            I hope this can help you.

                            Bye.

                            Comment

                            Working...