Ad Widget

Collapse

server reporting its agent is unreachable then resolves a few min later repeatadly

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • darrensnc
    Junior Member
    • May 2018
    • 5

    #1

    server reporting its agent is unreachable then resolves a few min later repeatadly

    Hi All

    I have around 200 server clients on my Zabbix 3.4 system. but only one I'm having an issue with .
    the problem is although the server is up Zabbix keeps reporting that its "agent is Unavailable for 5 Min" then the problem resolves it self. Then after about 3 or 4 minuets the same errors happens again

    the eventvwr on the client server shows the following to errors
    [3424]: active check configuration update from [10.*.*.*:10051] started to fail (ZBX_TCP_READ() failed: [0x0000274C] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)
    [3424]: active check configuration update from [10.*.*.*:10051] is working again

    what I would like is to know what could be causing this issue as it is filling up my emails

    the things I have checked are as follows

    IP address match in the host config of Zabbix server and in zabbix_agentd.conf on the server are the same

    the Host name matches in the host config of Zabbix server and in zabbix_agentd.conf on the server

    I can use telnet on ports 10050 & 10051 and ping to get the Zabbix server and the client server to talk to each other

    stoping and starting the Zabbix service and uninstalling the service and then installing the service on the client server

    there is no firewall blocking traffic and no proxy server between the Zabbix server and the client server


    I could not find any thing else on the forum that points to this as a recurring issue

    has any one else got any ideas what could be causing this????





    Last edited by darrensnc; 05-02-2019, 18:46.
  • sph919
    Member
    • Jan 2019
    • 38

    #2
    What version agent are you using on the server? Are you using encryption?

    Comment

    • darrensnc
      Junior Member
      • May 2018
      • 5

      #3
      Originally posted by sph919
      What version agent are you using on the server? Are you using encryption?
      zabbix_agentd.exe file version is 2.4.0.48940 this is the same agent I'm using for all 205 servers and none have the same problem

      and no there is no encryption

      Comment

      • sph919
        Member
        • Jan 2019
        • 38

        #4
        Originally posted by darrensnc

        zabbix_agentd.exe file version is 2.4.0.48940 this is the same agent I'm using for all 205 servers and none have the same problem

        and no there is no encryption
        OK, So if it the same agent on all 205 servers, I would start looking at the server. What os are you running?

        Comment

        • StephenG
          Junior Member
          • Feb 2019
          • 6

          #5
          Hi

          I have the same issues, after upgrade zabbix server and zabbix agent from 4.0.3 to 4.0.4
          zabbix server periodically lost connection to random zabbix agent.

          Comment

          • dimir
            Zabbix developer
            • Apr 2011
            • 1080

            #6
            Take a look in the server log file, is there any error about trapper not being able to handle request from active agent?

            Comment

            • StephenG
              Junior Member
              • Feb 2019
              • 6

              #7
              In server log messege like this:
              Code:
              Line 116:   5784:20190207:024120.253 Zabbix agent item "agent.hostname" on host "snamzvs001" failed: first network error, wait for 15 seconds
                  Line 132:   5847:20190207:024215.264 resuming Zabbix agent checks on host "snamzvs001": connection restored
                  Line 154:   5777:20190207:024337.196 Zabbix agent item "system.cpu.util" on host "snamzvs001" failed: first network error, wait for 15 seconds
                  Line 167:   5867:20190207:024432.404 resuming Zabbix agent checks on host "snamzvs001": connection restored
                  Line 250:   5815:20190207:024930.142 Zabbix agent item "vfs.fs.size[D:,free]" on host "snamzvs001" failed: first network error, wait for 15 seconds
                  Line 258:   5864:20190207:025025.575 resuming Zabbix agent checks on host "snamzvs001": connection restored
                  Line 306:   5817:20190207:025320.349 Zabbix agent item "agent.hostname" on host "snamzvs001" failed: first network error, wait for 15 seconds
                  Line 320:   5861:20190207:025415.051 resuming Zabbix agent checks on host "snamzvs001": connection restored
                  Line 365:   5797:20190207:025634.201 Zabbix agent item "vfs.fs.size[F:,pfree]" on host "snamzvs001" failed: first network error, wait for 15 seconds
                  Line 380:   5868:20190207:025729.090 resuming Zabbix agent checks on host "snamzvs001": connection restored
                  Line 433:   5817:20190207:030127.436 SNMP agent item "IMM2PCIRiser2Temp" on host "snamzvs001" failed: first network error, wait for 15 seconds

              Comment

              • dimir
                Zabbix developer
                • Apr 2011
                • 1080

                #8
                These are passive agent checks that fail, I have no idea why, perhaps you have too many data to be collected and not enough StartAgents for agent. But that is not related to active checks. Look for an entries like cannot send list of active checks .

                Comment

                • darrensnc
                  Junior Member
                  • May 2018
                  • 5

                  #9
                  Originally posted by dimir
                  Take a look in the server log file, is there any error about trapper not being able to handle request from active agent?
                  here are my a selection of the logs that relate to the server in question

                  2029:20190201:150215.250 Zabbix agent item "net.if.in[Microsoft Failover Cluster Virtual Adapter-WFP LightWeight Filter-0000]" on host "E****2" failed: first network error, wait for 15 seconds
                  2033:20190201:150310.506 resuming Zabbix agent checks on host "E****2": connection restored
                  2029:20190202:011643.477 Zabbix agent item "net.if.in[Microsoft ISATAP Adapter #4]" on host "E****2" failed: first network error, wait for 15 seconds
                  2033:20190202:011742.769 resuming Zabbix agent checks on host "E****2": connection restored

                  the only entries in between these are Slow Query logs like the ones below
                  "slow query: 4.890043 sec, "commit;""
                  and "slow query: 9.816782 sec, "delete from history_uint where itemid=31803 and clock<1548451721""

                  Comment

                  • dimir
                    Zabbix developer
                    • Apr 2011
                    • 1080

                    #10
                    Check busy trapper processes on the server.

                    Comment

                    • StephenG
                      Junior Member
                      • Feb 2019
                      • 6

                      #11
                      I don't have many data and trapper processes not to busy...
                      Also noticed that the problem concerns only windows servers. Any idea?
                      Click image for larger version  Name:	z1.png Views:	5 Size:	7.2 KB ID:	373514
                      Click image for larger version  Name:	z2.png Views:	3 Size:	94.3 KB ID:	373517
                      Last edited by StephenG; 08-02-2019, 04:59.

                      Comment

                      • dimir
                        Zabbix developer
                        • Apr 2011
                        • 1080

                        #12
                        Do you use system.localtime on Windows agents? There was one bug that was fixed in 4.0.4: https://support.zabbix.com/browse/ZBX-15301

                        Comment

                        • StephenG
                          Junior Member
                          • Feb 2019
                          • 6

                          #13
                          No i don't use system.localtime.
                          Server an all agents update to 4.0.4 version.
                          As i say before this issues take place after update server to version 4.0.4
                          I try to update all agent from 4.0.3 to 4.0.4 but no result....

                          Comment

                          • dimir
                            Zabbix developer
                            • Apr 2011
                            • 1080

                            #14
                            What does this give you?
                            Code:
                            grep 'cannot send list of active checks' /var/log/zabbix/zabbix_server.log{,.old}

                            Comment

                            • StephenG
                              Junior Member
                              • Feb 2019
                              • 6

                              #15
                              Sorry for long time.
                              many logs such this
                              Code:
                              5862:20190208:130910.668 cannot send list of active checks to "**.**.**.**": host [m***er] not monitored
                              5891:20190207:023454.170 cannot send list of active checks to "**.**.**.**": host [v****1] not found
                              second host i delete from zabbix config section.
                              Last edited by StephenG; 11-02-2019, 04:30.

                              Comment

                              Working...