Ad Widget

Collapse

Lots and Lots of false agent pings

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jeffusher
    Junior Member
    • May 2012
    • 2

    #1

    Lots and Lots of false agent pings

    Hi guys,

    Hope you can help.

    We are getting a load of false positive agent pings using the trigger:-

    {SERVERNAME:agent.ping.nodata(2m)}=1

    Everything seems OK, but we get PROBLEM: email messages and then shortly afterwards a RESOLVED: email message, usually within a few seconds of the problem message appearing.

    I have attached a couple of screenshots to show the setup, we're on version 2.0.3:-

    Here's the item:



    Here's the trigger:-



    And here's the emails a sample of the emails we are getting constantly:-



    The Zabbix queue is clear, I have even increased the number of agents on most of the machines in question to 5 but to no avail.

    Any assistance would be appreciated.

    Regards

    Jeff
    Last edited by jeffusher; 15-07-2013, 17:02.
  • duckdiver
    Junior Member
    • Aug 2010
    • 7

    #2
    We have the same issue here...some hosts are unreachable over proxy, but some others are reachable....but all of them are pingable the whole time

    are there any solutions or workaround available??

    Comment

    • vic
      Member
      • Jul 2013
      • 58

      #3
      I also have this exact problem at the default 5m interval. Since I use this to escalate alarms that a server is down I ended up creating a 10m trigger which seems to overcome any false alarms. Problem is that it takes 10minutes now before I am notified instead of 5 .

      So there must be some problem that sometimes takes more than 5 but less than 10minutes to time out or reset or whatever. Perhaps the timeout is set too short or can be made user configurable?

      I've tried increasing the timeout setting in zabbix_agentd.conf to 10seconds instead of the default 3. Will see if that helps.

      zabbix v2.0.6
      Last edited by vic; 24-07-2013, 00:00.

      Comment

      • tchjts1
        Senior Member
        • May 2008
        • 1605

        #4
        What are you seeing in your zabbix_server.log?
        A bunch of entries like this:
        on host [xxxxxxxxxx] failed: first network error, wait for 15 seconds

        And then have you looked at the performance of your Zabbix internal processes as mentioned in this sticky post?

        Comment

        • vic
          Member
          • Jul 2013
          • 58

          #5
          Originally posted by tchjts1
          What are you seeing in your zabbix_server.log?
          A bunch of entries like this:
          on host [xxxxxxxxxx] failed: first network error, wait for 15 seconds

          And then have you looked at the performance of your Zabbix internal processes as mentioned in this sticky post?

          https://www.zabbix.com/forum/showthread.php?t=41219
          Yes I do see those entries at the time the problem occurred. Internal processes graph looks fine during that time.
          Attached Files

          Comment

          • tchjts1
            Senior Member
            • May 2008
            • 1605

            #6
            Originally posted by vic
            Yes I do see those entries at the time the problem occurred. Internal processes graph looks fine during that time.
            I had that issue awhile back. One of the things I did to alleviate it, was on Zabbix server in zabbix_server.conf, was to increase Timeout= from the default of 3 to a value of 10. I am not saying that is the magic bullet, but it may help.

            If you change anything in that file, you have to restart your Zabbix server process.

            Comment

            • vic
              Member
              • Jul 2013
              • 58

              #7
              Originally posted by tchjts1
              I had that issue awhile back. One of the things I did to alleviate it, was on Zabbix server in zabbix_server.conf, was to increase Timeout= from the default of 3 to a value of 10. I am not saying that is the magic bullet, but it may help.

              If you change anything in that file, you have to restart your Zabbix server process.
              Just to update. I haven't had a false alert since making this change 2 days ago.

              Comment

              • tchjts1
                Senior Member
                • May 2008
                • 1605

                #8
                That's good to hear! I think it is one of the values that the Zabbix devs should increase the default value for in new releases. Lot of folks seem to have issues when it is set to 3.

                Comment

                • HarryKalahan
                  Member
                  • Jan 2014
                  • 40

                  #9
                  Hi all,

                  It's happening the same with 3.2.1 version.

                  This is the configuration of the trigger:
                  Code:
                  {HOST:agent.ping.nodata(300)}=1
                  The timeout is established to 5, although I tried 10 and nothing changes.

                  I could appreciate this behavior with Windows agents, with the Linux one the behavior is correct, although with version 2.2 worked always correctly for both.

                  Thanks in advanced!

                  Comment

                  • HarryKalahan
                    Member
                    • Jan 2014
                    • 40

                    #10
                    Hi again,

                    I don't know why, but I tried to link other template with the same items and triggers and it works correctly.

                    If you find this problem again, try to create a new template or try to define the trigger directly in the host.

                    Best regards.

                    Comment

                    Working...