Ad Widget

Collapse

Error connecting to agent

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nightwish
    Junior Member
    • Dec 2010
    • 25

    #1

    Error connecting to agent

    Hello


    I have zabbix version 1.8.4 in a centOS 5.5 ( 4GB, 2 CPU 3.3Mhz ). Status 376 Hosts and 8690 itens

    I Have mysql DB in another centOS 5.5.


    From time to time ( it no happens allways ), I get connective error's to the clients, and some time those connective problem disable my client.

    In the log I have:

    3444:20110719:040306.913 Item [clienthostname:net.tcp.service[ssh]] error: Got empty string from [iphostname]. Assuming that agent dropped connection because o
    f access permissions
    3450:20110719:040306.914 Item [clienthostname:agent.ping] error: Got empty string from [iphostname]. Assuming that agent dropped connection because of access
    permissions
    3448:20110719:040306.918 Item [clienthostname:clock-uptime] error: Got empty string from [iphostname]. Assuming that agent dropped connection because of access
    permissions
    3454:20110719:040306.918 Item [clienthostnameerf_counter[\System\File Write Bytes/sec]] error: Got empty string from [iphostname]. Assuming that agent dr
    opped connection because of access permissions


    After 2 minutes from the number of clients I get one error like this:

    3485:20110719:040321.192 [Z3005] Query failed: [2006] MySQL server has gone away [begin;]


    I have check the load of the server, network errors firewall, backups and I cannot have any more ideas.

    Anyone can point in the right directions ?.

    Thanks in advanced.
  • EnigmA-X
    Senior Member
    Zabbix Certified Specialist
    • Oct 2010
    • 116

    #2
    It looks to me that you have networking issues and that this has nothing to do with Zabbix itself.

    Please check your network, routing, dns, etc.

    Comment

    • dima_dm
      Senior Member
      • Dec 2009
      • 2697

      #3
      Can there be Timeout?

      /etc/zabbix/zabbix_agentd.conf
      Timeout=30
      /etc/zabbix/zabbix_server.conf
      Code:
      ### Option: Timeout
      #       Specifies how long we wait for agent, SNMP device or external check (in seconds).
      #
      # Mandatory: no
      # Range: 1-30
      # Default:
      Timeout=30
      Restart zabbix_agent and zabbix_server.
      Last edited by dima_dm; 20-07-2011, 11:24.

      Comment

      • nightwish
        Junior Member
        • Dec 2010
        • 25

        #4
        Hi Dima


        I think not. My timeouts are set to hight values:

        [root@server ~]# cat /etc/zabbix/zabbix_server.conf | grep -i timeout
        ### Option: Timeout
        Timeout=20
        ### Option: TrapperTimeout
        # TrapperTimeout=300
        [root@server~]# cat /etc/zabbix/zabbix_agentd.conf | grep -i timeout
        ### Option: Timeout
        # Spend no more than Timeout seconds on processing
        # Timeout=3
        Timeout=10
        [root@server ~]#


        When this happens I can see a hight average load, but i cannot see what is related to that, and why the load rises. This happens 2 or 3 times a day.


        I already change the piriod of housekeeping and it is not related to that.

        When the load rises, my clients start to do network errors.

        Thanks by your reply.

        Any more idea ?

        Regards.

        Comment

        • dima_dm
          Senior Member
          • Dec 2009
          • 2697

          #5
          Network problem.
          run util tcpdump on Zabbix Server and see network traffic Zabbix Agent <-> Zabbix Server.
          Example
          Zabbix Agent Active : zabbix_agent -> zabbix_server:10051,
          /usr/sbin/tcpdump -i eth0 -nn -s 0 -X "host IP_Agent and tcp port 10051"

          Zabbix Agent Passive: zabbix_agent:10050 <- zabbix_server
          /usr/sbin/tcpdump -i eth0 -nn -s 0 -X "host IP_Agent and tcp port 10050"
          Last edited by dima_dm; 21-07-2011, 10:54.

          Comment

          • nightwish
            Junior Member
            • Dec 2010
            • 25

            #6
            Hi Dima

            Thanks by reply

            I get the erros today:

            Item [windows_agent1:system.swap.size[,free]] error: Got empty string from [windows_ip]. Assuming that agent dropped connection becaus
            e of access permissions

            3462:20110721:022357.868 Zabbix Host [windows1_agent]: first network error, wait for 15 seconds
            3459:20110721:095718.549 Zabbix Host [windows2_agent]: first network error, wait for 15 seconds
            3440:20110721:095718.573 Zabbix Host [windows3_agent]: first network error, wait for 15 seconds
            3475:20110721:095718.585 Zabbix Host [windows4_agent]: first network error, wait for 15 seconds
            3467:20110721:095718.597 Zabbix Host [windows5_agent]: first network error, wait for 15 seconds
            3476:20110721:095718.609 Zabbix Host [windows6_agent]: first network error, wait for 15 seconds
            3449:20110721:095718.621 Zabbix Host [windows7_agent]: first network error, wait for 15 seconds
            3456:20110721:095718.633 Zabbix Host [unix1_agent]: first network error, wait for 15 seconds


            I cannot see any error or load in the disk/network/ I/O:

            Sar -n ALL
            12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
            09:50:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
            10:00:01 AM lo 1.86 1.86 107.82 107.82 0.00 0.00 0.00
            10:00:01 AM eth0 494.92 484.07 171131.24 107058.96 0.00 0.00 0.00


            sar -b
            09:10:01 AM tps rtps wtps bread/s bwrtn/s
            09:30:01 AM 12.95 0.00 12.95 0.00 221.12
            09:40:01 AM 13.37 0.00 13.37 0.00 228.20
            09:50:01 AM 12.26 0.00 12.26 0.00 211.16
            10:00:01 AM 13.73 0.07 13.66 0.65 238.24


            sar -u
            09:10:01 AM CPU %user %nice %system %iowait %steal %idle
            09:20:01 AM all 5.11 0.41 2.08 0.02 0.00 92.38
            09:30:01 AM all 5.05 0.40 1.99 0.02 0.00 92.55
            09:40:01 AM all 4.67 0.50 2.57 0.02 0.00 92.23
            09:50:01 AM all 4.06 0.41 2.01 0.02 0.00 93.50
            10:00:01 AM all 10.74 0.42 2.39 0.02 0.00 86.42
            10:10:01 AM all 4.93 0.42 2.16 0.02 0.00 92.47


            sar -q
            09:10:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
            09:20:01 AM 3 218 0.10 0.11 0.05
            09:30:01 AM 10 226 0.17 0.07 0.04
            09:40:01 AM 1 211 0.00 0.02 0.01
            09:50:01 AM 2 210 0.17 0.08 0.01
            10:00:01 AM 2 215 2.47 2.30 0.99
            10:10:01 AM 5 214 0.02 0.37 0.55

            I have no idea why this happens. Any more idea ?.

            Regardes..

            Comment

            • dima_dm
              Senior Member
              • Dec 2009
              • 2697

              #7
              See errors in system log of Zabbix Server (/var/log/messages ?) near 20110721:09:57:18.
              See errors on network environment in this time (router, switch etc).
              Last edited by dima_dm; 21-07-2011, 16:05.

              Comment

              • nightwish
                Junior Member
                • Dec 2010
                • 25

                #8
                Nothing there:

                [root@zabbix-server]# cat /var/log/messages
                Jul 17 04:02:35 zabbix-server syslogd 1.4.1: restart.
                Jul 19 14:01:33 zabbix-server yum: Installed: 1:net-snmp-utils-5.3.2.2-9.el5_5.1.x86_64



                Could be related to MYSQL performance?

                The values seems fine:

                Zabbix - Queue -> 149
                Zabbix - wcache - history -> 99.97 %


                This one is difficult to track .. All ideas are good to listen .

                Regards

                Comment

                • nightwish
                  Junior Member
                  • Dec 2010
                  • 25

                  #9
                  Hi Enigma


                  I cannot see any network problems in my server:

                  [root@zabbix-server]# netstat -in
                  Kernel Interface table
                  Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
                  eth0 1500 0 352947757 0 0 0 337613474 0 0 0 BMRU
                  lo 16436 0 1631644 0 0 0 1631644 0 0 0 LRU

                  No DNS problem at all or routing. I thing if the problem was network conection the errror would be ( no route to hosts ) or timeout ?..


                  Regards.

                  Comment

                  • dima_dm
                    Senior Member
                    • Dec 2009
                    • 2697

                    #10
                    That shows tcpdump in moment of the problem?

                    Comment

                    • nightwish
                      Junior Member
                      • Dec 2010
                      • 25

                      #11
                      Hi Dima


                      It's i difficult to track tcpdump. Because this happens randoly and for about 3 ou 10 seconds.


                      Sometimes it give errors by 60 seconds and my clients, apears in zabbix as unreachable ( what is not true and cause serious troubles with our helpdesk).


                      Regards.

                      Comment

                      • dima_dm
                        Senior Member
                        • Dec 2009
                        • 2697

                        #12
                        You can collect network statistics packet per second (pps) and bps with 5 sec interval on Zabbix Server, switch and router ports on trace of packets. Search for sharp splashes or fall of the traffic at moment of a problem.

                        Comment

                        Working...