Ad Widget

Collapse

zabbix agent is unreachable for 5 minutes. constantly

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • manowar
    Member
    • Apr 2008
    • 37

    #1

    zabbix agent is unreachable for 5 minutes. constantly

    I've seen posts along these lines but all seem to be related to proxies going offline, so apologies if this is an asked question.

    main server (UK). 3 proxies , - 2 in UK and one in US - each in a different physical site. 35 NVPS on master.

    I am seeing off and on through the day the trigger "zabbix agent on x is unreachable for 5 mins". This fires, then is OK, then will be a problem again. This happens on multiple hosts behind each of the proxies. At 02:20 am I get a huge burst of these lasting 30 mins, then all ok. This does not happen with hosts directly monitored by the zabbix master.

    I have created a "template zabbix agent - site X" for each proxy, with a dependency on that proxies "last seen time", and all hosts in each site inheriting the local site agent. Primarily so that if we lose the proxy or comms, zabbix only complains about the loss of the proxy not all the hosts behind it.

    ^^ above was done after reading related posts.

    why is that happening? is it an issue with comms links back to the master? is it slow proxies?
    I figure by increasing the timeout from 5-7 mins most of the noise goes, but that's not really an option. 5 mins is pretty high for a check to know a host is down anyway.

    Is there anything internally i should be monitoring to try and root this out?
    Last edited by manowar; 21-10-2015, 17:28.
  • manowar
    Member
    • Apr 2008
    • 37

    #2
    most of the boxes are VMs if thats of any signifance?

    Comment

    • manowar
      Member
      • Apr 2008
      • 37

      #3
      I had assuming this *isnt* a problem with the master server since we never see unreachable hosts monitored by that. links pretty good although we do see some backup traffic around the 2am time which is a fit for when we see these problems at their worst. I'm also assuming its not an individual proxy problem since the flaps are from all three.

      This is a real blocker for us moving (back) to zabbix. At the moment the signal:noise ratio is just way too high.

      Anything in debug logs I should be looking for?

      Comment

      • manowar
        Member
        • Apr 2008
        • 37

        #4
        Anothe rbarrage of unreachable hosts at 02:20-02:40 last night. No problems at all with proxy "last seen" time during that period.
        What else is happening?

        Comment

        • manowar
          Member
          • Apr 2008
          • 37

          #5
          Seriously? no ideas at all?

          Comment

          • manowar
            Member
            • Apr 2008
            • 37

            #6
            This looks pretty damming. From one of the proxies. Suggests comms or overloading on the master server to me. Timestamps match precisely.

            Code:
             10487:20151022:021532.937 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151022:021632.937 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151022:021732.937 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151022:021832.938 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151023:021535.768 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151023:021635.768 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151023:021735.769 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151023:021835.769 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151024:021540.712 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151024:021640.712 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151024:021740.712 sending heartbeat message to server failed: error:"no response: network error", info:""
             10487:20151024:021840.712 sending heartbeat message to server failed: error:"no response: network error", info:""

            Comment

            • manowar
              Member
              • Apr 2008
              • 37

              #7
              *ahem*

              Code:
              14 2 * * * /usr/local/sbin/zabbix-server-backup.sh > /tmp/zabbix-server-backup.log 2>&1

              Comment

              • kDas
                Junior Member
                • Nov 2015
                • 1

                #8
                Absolutely same problem here with 2.4.6.

                Zabbix proxy in active mode works well for a while, then zabbix sender fork of proxy stuck in "sending data", then I got same error as you.

                Comment

                • ingus.vilnis
                  Senior Member
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Mar 2014
                  • 908

                  #9
                  Hi,

                  This looks like a performance issue. Please check internal graphs on all your Zabbix servers and proxies if any internal processes are overloaded or if you have run out of cache.


                  Also make sure you log slow queries to see any problems with database performance.

                  Best Regards,
                  Ingus

                  Comment

                  Working...