Ad Widget

Collapse

Server xxx is unreachable - problem - CentOS 5.2 x64bit

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tommyboy
    Junior Member
    • Mar 2010
    • 11

    #1

    Server xxx is unreachable - problem - CentOS 5.2 x64bit

    Hi

    I'm running Zabbix 1.8.1 on CentOS 5.2 and 5.4

    Intermittently the "Server xxx is unreachable - problem" - happens on a few of my CentOS 5.2 x64bit servers - but not all - about 5 of 20 servers.

    When I Telnet to the zabbix agent port of the unreachable server.
    The zabbix_agent service does not respond to "agent.version" query - it just times-out.

    Restarting the zabbix_agent on the unreachable server - temporarily fixes the - "Server xxx is unreachable - problem"

    Note: I also a hight amount of tcp "TIME_WAIT" - on average 200 "TIME_WAIT" connections.

    I have enabled debug logging for some the servers that have the "Server xxx is unreachable - problem" and log does show agent check, but it does not respond to the "agent.version" telnet query.

    Thanks in advance

    Steven
  • richlv
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2005
    • 3112

    #2
    if you allow connecting from localhost on those servers (in agentd config file), do queries from the same host work ?
    Zabbix 3.0 Network Monitoring book

    Comment

    • tommyboy
      Junior Member
      • Mar 2010
      • 11

      #3
      Hi

      I have now enabled the localhost connection, in the zabbix_agent.conf file and it now allows me to telnet locally to run the agent port and run the "agent.version" query (post restarting the zabbix agent).

      Normal behaviour of agent - when telneting to the zabbix_agent port 10050 - locally on the same box. The telnet connection would terminate normally within 1 second (when local connections was disabled and only remote server connections allowed) see below:

      [root@apdv1 tmp]# time telnet 127.0.0.1 10050
      Trying 127.0.0.1...
      Connected to localhost.localdomain (127.0.0.1).
      Escape character is '^]'.
      Connection closed by foreign host.

      real 0m0.010s
      user 0m0.003s
      sys 0m0.007s
      [root@apdv1 tmp]#

      Abnormal Normal behaviour of agent - when telneting to the zabbix_agent port 10050 - locally on the same box. But if you tail the zabbix log with debug enabled - you will see that the log file is ticking over with it routine checks - but does not respond to telnet queries like "agent.version" and the network service does not die. The zabbix agent continues to listen on port 10050

      [root@wbtr1 ~]# time nc 127.0.0.1 10050

      real 3m52.554s
      user 0m0.001s
      sys 0m0.007s
      [root@wbtr1 ~]#

      Just a little bit more about our set-up here.

      We have 2 Zabbix servers one for our intranet network and one for our DMZ network - which work independantly of each other. The problem agents are in the DMZ network - that are configured to report only to our DMZ zabbix server. The OS of the zabbix_agents is CentOS 5.2 x64 and the Zabbix server OS is CentOS 5.4 x64. The Zabbix server was compiled with TCP IPv6 support enabled.

      The server is a native machine, where as the some of the problem agents are running on a VMware ESXi 4 hosts.

      Thanks for your interest

      Steven

      PS: Will be using the following temp work around for my problem hosts
      with a cron job to test the localhost telnet time if it is longer than 1 second - if it is I will then restart the zabbix_agent.

      [root@wbtr1 ~]# /usr/bin/time -o /tmp/zabbix_agent_telnet_time_test.out -f %e nc -w 10 127.0.0.1 10050
      [root@wbtr1 ~]#
      [root@wbtr1 ~]# more /tmp/zabbix_agent_telnet_time_test.out
      10.00
      [root@wbtr1 ~]#

      Comment

      • richlv
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2005
        • 3112

        #4
        have you modified agent config in some way, like changing StartAgents param ?
        Zabbix 3.0 Network Monitoring book

        Comment

        • tommyboy
          Junior Member
          • Mar 2010
          • 11

          #5
          Hi

          I have attached the zabbix_agent config sript that we are using - it is the same scripts I got from the flowing site:




          I have noticed the following interesting behaviour when the zabbix agent is not responding. The 3 "SYN_RECV" network connections.
          [root@pydbdv1 ~]# netstat -tnap
          Active Internet connections (servers and established)
          Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
          tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 3733/zabbix_agentd
          tcp 0 0 172.16.20.39:10050 172.20.24.70:34619 SYN_RECV -
          tcp 0 0 172.16.20.39:10050 172.20.24.70:57628 SYN_RECV -
          tcp 0 0 172.16.20.39:10050 172.20.24.70:52443 SYN_RECV -
          tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 9632/postmaster


          Some sample log output:
          3739:20100317:125531.255 refresh_active_checks('172.20.24.70',10051)
          3739:20100317:125531.257 Sending [{
          "request":"active checks",
          "host":"PYDBDV1"}]
          3739:20100317:125531.257 Before read
          3739:20100317:125531.262 Got [{
          "response":"success",
          "data":[]}]
          3739:20100317:125531.262 In parse_list_of_checks()
          3739:20100317:125531.262 In disable_all_metrics()
          3739:20100317:125542.276 Sleeping for 1 seconds
          3739:20100317:125543.276 In send_buffer('172.20.24.70','10051')
          3739:20100317:125543.276 Values in the buffer 0 Max 100
          3739:20100317:125543.277 Sleeping for 1 seconds
          3733:20100317:125544.117 One child process died (PID:3739). Exiting ...
          3733:20100317:125544.119 zbx_on_exit() called.
          3733:20100317:125546.122 Zabbix Agent stopped. Zabbix 1.8.1 (revision 9702).
          (END)

          # Post zabbix restart
          19680:20100317:130902.261 Requested [vfs.fs.size[/tmp,pused]]
          19680:20100317:130902.261 Sending back [16.647683]
          19681:20100317:130902.263 Processing request.
          19681:20100317:130902.264 Requested [vfs.fs.inode[/,pfree]]
          19681:20100317:130902.264 Sending back [98.000153]
          19679:20100317:130902.266 Processing request.
          19679:20100317:130902.266 Requested [proc.num[]]
          19679:20100317:130902.269 Sending back [192]
          19682:20100317:130902.399 In send_buffer('172.20.24.70','10051')
          19682:20100317:130902.399 Values in the buffer 0 Max 100

          Thanks

          Steven
          Attached Files

          Comment

          • richlv
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Oct 2005
            • 3112

            #6
            i'll admit that i'm pretty much out of ideas why this is happening.
            one of the remaining things to check would be whether agent is trying to check something network dependent - maybe an nfs mount ? other than that i can't help here much
            Zabbix 3.0 Network Monitoring book

            Comment

            • nima0102
              Senior Member
              • May 2010
              • 106

              #7
              bug in zabbix_agentd

              Hi
              I have the same problem and some of our servers is not monitored and in agents, connections are in "SYN_RECV" state!
              Of course in this state zabbix_agentd responses to some parameters not all of them.
              I think this is bug in zabbix_agentd.

              Thanks for more help or guidance
              Last edited by nima0102; 19-06-2010, 08:52.

              Comment

              Working...