Ad Widget

Collapse

Client hosts not talking to Server

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • droche7
    Junior Member
    • Sep 2006
    • 13

    #1

    Client hosts not talking to Server

    I am having issues with specific hosts talking to my zabbix server. I have verified all the same configuration settings on working hosts to these non working hosts. the only thing i can tell that is different is that the problematic hosts are located in Frankfurt and take up to 100ms to ping the zabbix server (in Chicago). However i have hosts in the UK which have a 90ms ping time to the zabbix server (in Chicago) and they have no issues reporting. Here is the client log file with full debug mode:

    017367:20060920:215441 Before read
    017367:20060920:215441 In delete_all_metrics()
    017367:20060920:215441 Parsed [ZBX_EOF]
    017367:20060920:215441 Sleeping for 60 seconds
    017367:20060920:215541 In refresh_metrics()
    017367:20060920:215541 get_active_checks: host[10.33.93.118] port[10051]
    017367:20060920:215541 Sending [ZBX_GET_ACTIVE_CHECKS
    lx-deeuopt1a

    I have no idead if the timeout in the config file "Timeout=20" is going to be a problem. I set it to 30 and still nothing. Also I checked the server logs and there is nothing in there regarding my hosts that are not conencting

    I ran /sbin/zabbix_agent -p and it is collecting data but it is just not reporting it (example

    web.page.perf[www.zabbix.com,,80] [d|0.150182]
    web.page.regexp[www.zabbix.com,,80] [m|ZBX_NOTSUPPORTED]
    cpu[idle1] [m|ZBX_NOTSUPPORTED]
    io[disk_io] [d|817505.000000]
    kern[maxfiles] [u|1190498]
    memory[buffers] [u|87060480]
    system[uname] [t|Linux linux-servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux]
    sensor[temp1] [m|ZBX_NOTSUPPORTED]
    swap[total] [u|8587018240]
    version[zabbix_agent] [s|1.1]
    agent.ping [u|1]
    agent.version [s|1.1]
    kernel.maxfiles [u|1190498]
    kernel.maxproc [m|ZBX_NOTSUPPORTED]
    vfs.file.cksum[/etc/services] [u|3007857096]
    vfs.file.md5sum[/etc/services] [s|d0db6751e69c725ed5267f165919bad1]
    system.cpu.switches [m|ZBX_NOTSUPPORTED]
    system.cpu.intr [u|498943808]
    net.tcp.dns[127.0.0.1,localhost] [u|0]
    net.tcp.listen[80] [m|ZBX_NOTSUPPORTED]
    net.tcp.port[,80] [u|0]
    net.tcp.service[ssh,127.0.0.1,22] [u|1]
    net.tcp.service.perf[ssh,127.0.0.1,22] [d|0.010489]
    net.if.in[lo,bytes] [u|4294967295]
    net.if.out[lo,bytes] [u|4294967295]
    net.if.total[lo,bytes] [u|4294967294]
    net.if.collisions[lo] [u|0]
    vfs.fs.size[/,free] [u|5857424]
    vfs.fs.inode[/,free] [u|1196879]
    vfs.dev.read[sda,operations] [m|ZBX_NOTSUPPORTED]
    vfs.dev.write[sda,sectors] [m|ZBX_NOTSUPPORTED]
    vm.memory.size[total] [u|12601737216]
    proc.num[inetd,,,] [u|0]
    proc.mem[inetd,,] [u|0]
    system.cpu.util[all,user,avg1] [u|9]
    system.cpu.load[all,avg1] [d|1.190000]
    system.swap.size[all,free] [u|8586854400]
    system.swap.in[all] [m|ZBX_NOTSUPPORTED]
    system.swap.out[all,count] [m|ZBX_NOTSUPPORTED]
    system.hostname [t|linux-servername]
    system.uname [t|Linux linux-servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux]
    system.uptime [u|395672]
    system.users.num [d|8.000000]
    sar.idle [m|ZBX_NOTSUPPORTED]
    (standard_in) 2: parse error
    sar.busy [t|]
    sar.system [t|85.43]
    sar.nice [t|3.45]
    sar.user [t|0.92]
    ntp.offset [t|0.165]
    cron.orphan [t|0]
    hardware.model [t|PowerEdge 1855]
    mem.free [t|21024]
    swap.free [t|8385600]
    hardware.serial [t|3DF0C2J]
    Usage: grep [OPTION]... PATTERN [FILE]...
    Try `grep --help' for more information.
    ERROR: User name does not exist.
    ********* simple selection ********* ********* selection by list *********
    -A all processes -C by command name
    -N negate selection -G by real group ID (supports names)
    -a all w/ tty except session leaders -U by real user ID (supports names)
    -d all except session leaders -g by session OR by effective group name
    -e all processes -p by process ID
    T all processes on this terminal -s processes in the sessions given
    a all w/ tty, including other users -t by tty
    g OBSOLETE -- DO NOT USE -u by effective user ID (supports names)
    r only running processes U processes for specified users
    x processes w/o controlling ttys t by tty
    *********** output format ********** *********** long options ***********
    -o,o user-defined -f full --Group --User --pid --cols --ppid
    -j,j job control s signal --group --user --sid --rows --info
    -O,O preloaded -o v virtual memory --cumulative --format --deselect
    -l,l long u user-oriented --sort --tty --forest --version
    -F extra full X registers --heading --no-heading --context
    ********* misc options *********
    -V,V show version L list format codes f ASCII art forest
    -m,m,-L,-T,H threads S children in sum -y change -l format
    -M,Z security data c true command name -c scheduling class
    -w,w wide output n numeric WCHAN,UID -H process hierarchy
    ps.mem[/bin/ps -u -o pid,args | /bin/grep -i | /bin/grep -v grep | /bin/awk '{print $ 1}' | /usr/bin/xargs ps -o rss --noheaders] [t|1632


    LASTLY:
    I am running rhel4 update 4 on the problematic hosts. rhel 4 update 3 is on all the other hosts. Has anyone seen any problems with the new rhel update?

    Thank you,

    Dennis
  • James Wells
    Senior Member
    • Jun 2005
    • 664

    #2
    Greetings,

    Right off, the bat, I would suggest trying to telnet to port 10051 on the Zabbix server from your systems in Germany. If that works, I would then turn up the logging level on the zabbix_agentd and see if there is anything else in the logs that may indicate timeouts, etc. If on the other hand, it doesn't work, I would suggest using tcpdump / wireshark on the Zabbix server to see if the packets are even getting there from the agents in Germany.
    Unofficial Zabbix Developer

    Comment

    • droche7
      Junior Member
      • Sep 2006
      • 13

      #3
      Ok, actually I didnt include everything first time:
      1. Telnet works no problem to port 10051 from problematic host to zabbix server
      2. Logging level is at 4 on host and thats all i get. :-(
      3. ran tcpdump on the server...do not see anything coming from any of these hosts. I also built another 64bit rhel4 update4 host in my office and it is not yet talking to the zabbix server eventhough it is polling data

      4. Also, host is not only rhel 4 update 4 but 64bit too. Any known issues?

      5. Is there anyway to force an update to the zabbix server so i can run tcpdump and see if i can see the traffic?

      Thanks,
      Dennis

      Comment

      • James Wells
        Senior Member
        • Jun 2005
        • 664

        #4
        Originally posted by droche7
        5. Is there anyway to force an update to the zabbix server so i can run tcpdump and see if i can see the traffic?
        Yes, and no. I am not aware of any mechanism to force the zabbix_agent(d) to send data to the server, however, you can use the zabbix_send command, which uses a lot of the same code base to send the data manually. This may give you a better idea where to look. To use zabbix_sender, you will want to change one of the host's item's type to Zabbix Trapper and then use something like the following;
        Code:
        zabbix_send <Zabbix Server IP> <Zabbix Server Port> <Faulty Host Name> <Item Key> <Value You Want To Send>
        This will give you something to sniff for with tcpdump, and since you will know what the key and value are, you will be able to use protocol decryption to determine if the packet is actually getting there correctly.

        I am not aware of any issues with 64bit RHEL that would block network connectivity.
        Unofficial Zabbix Developer

        Comment

        • droche7
          Junior Member
          • Sep 2006
          • 13

          #5
          ok so the zabbix server is seeing some traffic from the host in question...

          tcpdump -v | grep LINUX_CLIENT_SERVER
          ----------------------------------------
          tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 96 bytes
          08:22:44.047296 IP (tos 0x0, ttl 56, id 36955, offset 0, flags [DF], proto 6, length: 60) LINUX_CLIENT_SERVER.47822 > LNX_ZABBIX_SERVER.tib10051: S [tcp sum ok] 2520870527:2520870527(0) win 5840 <mss 1460,sackOK,timestamp 457834020 0,nop,wscale 9>
          08:22:44.047309 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 6, length: 60) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47822: S [tcp sum ok] 1058555986:1058555986(0) ack 2520870528 win 5792 <mss 1460,sackOK,timestamp 431778753 457834020,nop,wscale 2>
          08:22:44.157732 IP (tos 0x0, ttl 56, id 36957, offset 0, flags [DF], proto 6, length: 52) LINUX_CLIENT_SERVER.47822 > LNX_ZABBIX_SERVER.tib10051: . [tcp sum ok] ack 1 win 12 <nop,nop,timestamp 457834131 431778753>
          08:22:44.157752 IP (tos 0x0, ttl 56, id 36959, offset 0, flags [DF], proto 6, length: 87) LINUX_CLIENT_SERVER.47822 > LNX_ZABBIX_SERVER.tib10051: P 1:36(35) ack 1 win 12 <nop,nop,timestamp 457834131 431778753>
          08:22:44.157764 IP (tos 0x0, ttl 64, id 9315, offset 0, flags [DF], proto 6, length: 52) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47822: . [tcp sum ok] ack 36 win 1448 <nop,nop,timestamp 431778864 457834131>
          08:22:44.158320 IP (tos 0x0, ttl 64, id 9317, offset 0, flags [DF], proto 6, length: 60) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47822: P [tcp sum ok] 1:9(8) ack 36 win 1448 <nop,nop,timestamp 431778864 457834131>
          08:22:44.158351 IP (tos 0x0, ttl 64, id 9319, offset 0, flags [DF], proto 6, length: 52) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47822: F [tcp sum ok] 9:9(0) ack 36 win 1448 <nop,nop,timestamp 431778864 457834131>
          08:22:44.270418 IP (tos 0x0, ttl 56, id 36961, offset 0, flags [DF], proto 6, length: 52) LINUX_CLIENT_SERVER.47822 > LNX_ZABBIX_SERVER.tib10051: . [tcp sum ok] ack 9 win 12 <nop,nop,timestamp 457834242 431778864>
          08:22:44.270431 IP (tos 0x0, ttl 56, id 36963, offset 0, flags [DF], proto 6, length: 52) LINUX_CLIENT_SERVER.47822 > LNX_ZABBIX_SERVER.tib10051: F [tcp sum ok] 36:36(0) ack 10 win 12 <nop,nop,timestamp 457834242 431778864>
          08:22:44.270441 IP (tos 0x0, ttl 64, id 9321, offset 0, flags [DF], proto 6, length: 52) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47822: . [tcp sum ok] ack 37 win 1448 <nop,nop,timestamp 431778976 457834242>
          08:23:44.278828 IP (tos 0x0, ttl 56, id 56983, offset 0, flags [DF], proto 6, length: 60) LINUX_CLIENT_SERVER.47884 > LNX_ZABBIX_SERVER.tib10051: S [tcp sum ok] 2579707898:2579707898(0) win 5840 <mss 1460,sackOK,timestamp 457894253 0,nop,wscale 9>
          08:23:44.278841 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 6, length: 60) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47884: S [tcp sum ok] 1110197006:1110197006(0) ack 2579707899 win 5792 <mss 1460,sackOK,timestamp 431838983 457894253,nop,wscale 2>
          08:23:44.389012 IP (tos 0x0, ttl 56, id 56985, offset 0, flags [DF], proto 6, length: 52) LINUX_CLIENT_SERVER.47884 > LNX_ZABBIX_SERVER.tib10051: . [tcp sum ok] ack 1 win 12 <nop,nop,timestamp 457894363 431838983>
          08:23:44.389029 IP (tos 0x0, ttl 56, id 56987, offset 0, flags [DF], proto 6, length: 87) LINUX_CLIENT_SERVER.47884 > LNX_ZABBIX_SERVER.tib10051: P 1:36(35) ack 1 win 12 <nop,nop,timestamp 457894363 431838983>
          08:23:44.389036 IP (tos 0x0, ttl 64, id 5242, offset 0, flags [DF], proto 6, length: 52) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47884: . [tcp sum ok] ack 36 win 1448 <nop,nop,timestamp 431839093 457894363>
          08:23:44.389643 IP (tos 0x0, ttl 64, id 5244, offset 0, flags [DF], proto 6, length: 60) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47884: P [tcp sum ok] 1:9(8) ack 36 win 1448 <nop,nop,timestamp 431839094 457894363>
          08:23:44.389671 IP (tos 0x0, ttl 64, id 5246, offset 0, flags [DF], proto 6, length: 52) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47884: F [tcp sum ok] 9:9(0) ack 36 win 1448 <nop,nop,timestamp 431839094 457894363>
          08:23:44.499573 IP (tos 0x0, ttl 56, id 56989, offset 0, flags [DF], proto 6, length: 52) LINUX_CLIENT_SERVER.47884 > LNX_ZABBIX_SERVER.tib10051: . [tcp sum ok] ack 9 win 12 <nop,nop,timestamp 457894474 431839094>
          08:23:44.499587 IP (tos 0x0, ttl 56, id 56991, offset 0, flags [DF], proto 6, length: 52) LINUX_CLIENT_SERVER.47884 > LNX_ZABBIX_SERVER.tib10051: F [tcp sum ok] 36:36(0) ack 10 win 12 <nop,nop,timestamp 457894474 431839094>
          08:23:44.499595 IP (tos 0x0, ttl 64, id 5248, offset 0, flags [DF], proto 6, length: 52) LNX_ZABBIX_SERVER.tib10051 > LINUX_CLIENT_SERVER.47884: . [tcp sum ok] ack 37 win 1448 <nop,nop,timestamp 431839204 457894474>
          96665 packets captured
          96971 packets received by filter
          217 packets dropped by kernel

          Comment

          • James Wells
            Senior Member
            • Jun 2005
            • 664

            #6
            Was that via zabbix_send, or was that the actual zabbix_agent(d)?
            Unofficial Zabbix Developer

            Comment

            • droche7
              Junior Member
              • Sep 2006
              • 13

              #7
              This is just running tcpdump -v on the zabbix server and greping for the hostname i am looking for.

              where is zabbix_send located or is it a parameter of another command? its not in /sbin

              Dennis

              Comment

              • James Wells
                Senior Member
                • Jun 2005
                • 664

                #8
                It may not have been installed on your system, if not, you will need to compile it. It's part of the standard source package.
                Unofficial Zabbix Developer

                Comment

                • droche7
                  Junior Member
                  • Sep 2006
                  • 13

                  #9
                  Is there any published documentation on the usage of zabbix_send other than usage: ./zabbix_sender [<Zabbix server> <port> <server> <key> <value>]

                  Thanks,

                  Comment

                  • James Wells
                    Senior Member
                    • Jun 2005
                    • 664

                    #10
                    Unfortunately, no. The good news though is that your hosts are attempting to talk to the server, as listed in your most recent tcpdump. Now the fact that the data is getting to the server means that we know the network is configured correctly. The next things to check, is to ensure that you have the hostnames configured at both ends correctly. Check the zabbix_agent(d).conf file and make sure that the name listed in there for the client matches exaclty what you have configured in Zabbix. The entry in the zabbix_agent(d).conf file that you are looking at is "Hostname"
                    Unofficial Zabbix Developer

                    Comment

                    • droche7
                      Junior Member
                      • Sep 2006
                      • 13

                      #11
                      /etc/zabbix/xabbiz_agentd.conf on both hosts are setup properly.

                      Comment

                      • droche7
                        Junior Member
                        • Sep 2006
                        • 13

                        #12
                        /etc/zabbix/zabbix_agentd.conf is setup properly on all hosts.

                        Thanks,

                        Comment

                        • droche7
                          Junior Member
                          • Sep 2006
                          • 13

                          #13
                          ok so on the problematic host i ran tcpdump port 10051 and noticed traffic is really slow. Is there any issues with possible timeouts as when I run the same tcpdump command on a host here in our main office I see lots of traffic.

                          Could this be the issue?

                          Thanks,

                          Dennis

                          Comment

                          • droche7
                            Junior Member
                            • Sep 2006
                            • 13

                            #14
                            Client hosts not talking to Server

                            All I resolved this, basically removed agent from host, removed host from Zabbix server, readded host to zabbix server, reinstalled agent on host....Not sure why but it worked.

                            Thanks,

                            Dennis Roche

                            Comment

                            Working...