I am having issues with specific hosts talking to my zabbix server. I have verified all the same configuration settings on working hosts to these non working hosts. the only thing i can tell that is different is that the problematic hosts are located in Frankfurt and take up to 100ms to ping the zabbix server (in Chicago). However i have hosts in the UK which have a 90ms ping time to the zabbix server (in Chicago) and they have no issues reporting. Here is the client log file with full debug mode:
017367:20060920:215441 Before read
017367:20060920:215441 In delete_all_metrics()
017367:20060920:215441 Parsed [ZBX_EOF]
017367:20060920:215441 Sleeping for 60 seconds
017367:20060920:215541 In refresh_metrics()
017367:20060920:215541 get_active_checks: host[10.33.93.118] port[10051]
017367:20060920:215541 Sending [ZBX_GET_ACTIVE_CHECKS
lx-deeuopt1a
I have no idead if the timeout in the config file "Timeout=20" is going to be a problem. I set it to 30 and still nothing. Also I checked the server logs and there is nothing in there regarding my hosts that are not conencting
I ran /sbin/zabbix_agent -p and it is collecting data but it is just not reporting it (example
web.page.perf[www.zabbix.com,,80] [d|0.150182]
web.page.regexp[www.zabbix.com,,80] [m|ZBX_NOTSUPPORTED]
cpu[idle1] [m|ZBX_NOTSUPPORTED]
io[disk_io] [d|817505.000000]
kern[maxfiles] [u|1190498]
memory[buffers] [u|87060480]
system[uname] [t|Linux linux-servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux]
sensor[temp1] [m|ZBX_NOTSUPPORTED]
swap[total] [u|8587018240]
version[zabbix_agent] [s|1.1]
agent.ping [u|1]
agent.version [s|1.1]
kernel.maxfiles [u|1190498]
kernel.maxproc [m|ZBX_NOTSUPPORTED]
vfs.file.cksum[/etc/services] [u|3007857096]
vfs.file.md5sum[/etc/services] [s|d0db6751e69c725ed5267f165919bad1]
system.cpu.switches [m|ZBX_NOTSUPPORTED]
system.cpu.intr [u|498943808]
net.tcp.dns[127.0.0.1,localhost] [u|0]
net.tcp.listen[80] [m|ZBX_NOTSUPPORTED]
net.tcp.port[,80] [u|0]
net.tcp.service[ssh,127.0.0.1,22] [u|1]
net.tcp.service.perf[ssh,127.0.0.1,22] [d|0.010489]
net.if.in[lo,bytes] [u|4294967295]
net.if.out[lo,bytes] [u|4294967295]
net.if.total[lo,bytes] [u|4294967294]
net.if.collisions[lo] [u|0]
vfs.fs.size[/,free] [u|5857424]
vfs.fs.inode[/,free] [u|1196879]
vfs.dev.read[sda,operations] [m|ZBX_NOTSUPPORTED]
vfs.dev.write[sda,sectors] [m|ZBX_NOTSUPPORTED]
vm.memory.size[total] [u|12601737216]
proc.num[inetd,,,] [u|0]
proc.mem[inetd,,] [u|0]
system.cpu.util[all,user,avg1] [u|9]
system.cpu.load[all,avg1] [d|1.190000]
system.swap.size[all,free] [u|8586854400]
system.swap.in[all] [m|ZBX_NOTSUPPORTED]
system.swap.out[all,count] [m|ZBX_NOTSUPPORTED]
system.hostname [t|linux-servername]
system.uname [t|Linux linux-servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux]
system.uptime [u|395672]
system.users.num [d|8.000000]
sar.idle [m|ZBX_NOTSUPPORTED]
(standard_in) 2: parse error
sar.busy [t|]
sar.system [t|85.43]
sar.nice [t|3.45]
sar.user [t|0.92]
ntp.offset [t|0.165]
cron.orphan [t|0]
hardware.model [t|PowerEdge 1855]
mem.free [t|21024]
swap.free [t|8385600]
hardware.serial [t|3DF0C2J]
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
ERROR: User name does not exist.
********* simple selection ********* ********* selection by list *********
-A all processes -C by command name
-N negate selection -G by real group ID (supports names)
-a all w/ tty except session leaders -U by real user ID (supports names)
-d all except session leaders -g by session OR by effective group name
-e all processes -p by process ID
T all processes on this terminal -s processes in the sessions given
a all w/ tty, including other users -t by tty
g OBSOLETE -- DO NOT USE -u by effective user ID (supports names)
r only running processes U processes for specified users
x processes w/o controlling ttys t by tty
*********** output format ********** *********** long options ***********
-o,o user-defined -f full --Group --User --pid --cols --ppid
-j,j job control s signal --group --user --sid --rows --info
-O,O preloaded -o v virtual memory --cumulative --format --deselect
-l,l long u user-oriented --sort --tty --forest --version
-F extra full X registers --heading --no-heading --context
********* misc options *********
-V,V show version L list format codes f ASCII art forest
-m,m,-L,-T,H threads S children in sum -y change -l format
-M,Z security data c true command name -c scheduling class
-w,w wide output n numeric WCHAN,UID -H process hierarchy
ps.mem[/bin/ps -u -o pid,args | /bin/grep -i | /bin/grep -v grep | /bin/awk '{print $ 1}' | /usr/bin/xargs ps -o rss --noheaders] [t|1632
LASTLY:
I am running rhel4 update 4 on the problematic hosts. rhel 4 update 3 is on all the other hosts. Has anyone seen any problems with the new rhel update?
Thank you,
Dennis
017367:20060920:215441 Before read
017367:20060920:215441 In delete_all_metrics()
017367:20060920:215441 Parsed [ZBX_EOF]
017367:20060920:215441 Sleeping for 60 seconds
017367:20060920:215541 In refresh_metrics()
017367:20060920:215541 get_active_checks: host[10.33.93.118] port[10051]
017367:20060920:215541 Sending [ZBX_GET_ACTIVE_CHECKS
lx-deeuopt1a
I have no idead if the timeout in the config file "Timeout=20" is going to be a problem. I set it to 30 and still nothing. Also I checked the server logs and there is nothing in there regarding my hosts that are not conencting
I ran /sbin/zabbix_agent -p and it is collecting data but it is just not reporting it (example

web.page.perf[www.zabbix.com,,80] [d|0.150182]
web.page.regexp[www.zabbix.com,,80] [m|ZBX_NOTSUPPORTED]
cpu[idle1] [m|ZBX_NOTSUPPORTED]
io[disk_io] [d|817505.000000]
kern[maxfiles] [u|1190498]
memory[buffers] [u|87060480]
system[uname] [t|Linux linux-servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux]
sensor[temp1] [m|ZBX_NOTSUPPORTED]
swap[total] [u|8587018240]
version[zabbix_agent] [s|1.1]
agent.ping [u|1]
agent.version [s|1.1]
kernel.maxfiles [u|1190498]
kernel.maxproc [m|ZBX_NOTSUPPORTED]
vfs.file.cksum[/etc/services] [u|3007857096]
vfs.file.md5sum[/etc/services] [s|d0db6751e69c725ed5267f165919bad1]
system.cpu.switches [m|ZBX_NOTSUPPORTED]
system.cpu.intr [u|498943808]
net.tcp.dns[127.0.0.1,localhost] [u|0]
net.tcp.listen[80] [m|ZBX_NOTSUPPORTED]
net.tcp.port[,80] [u|0]
net.tcp.service[ssh,127.0.0.1,22] [u|1]
net.tcp.service.perf[ssh,127.0.0.1,22] [d|0.010489]
net.if.in[lo,bytes] [u|4294967295]
net.if.out[lo,bytes] [u|4294967295]
net.if.total[lo,bytes] [u|4294967294]
net.if.collisions[lo] [u|0]
vfs.fs.size[/,free] [u|5857424]
vfs.fs.inode[/,free] [u|1196879]
vfs.dev.read[sda,operations] [m|ZBX_NOTSUPPORTED]
vfs.dev.write[sda,sectors] [m|ZBX_NOTSUPPORTED]
vm.memory.size[total] [u|12601737216]
proc.num[inetd,,,] [u|0]
proc.mem[inetd,,] [u|0]
system.cpu.util[all,user,avg1] [u|9]
system.cpu.load[all,avg1] [d|1.190000]
system.swap.size[all,free] [u|8586854400]
system.swap.in[all] [m|ZBX_NOTSUPPORTED]
system.swap.out[all,count] [m|ZBX_NOTSUPPORTED]
system.hostname [t|linux-servername]
system.uname [t|Linux linux-servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux]
system.uptime [u|395672]
system.users.num [d|8.000000]
sar.idle [m|ZBX_NOTSUPPORTED]
(standard_in) 2: parse error
sar.busy [t|]
sar.system [t|85.43]
sar.nice [t|3.45]
sar.user [t|0.92]
ntp.offset [t|0.165]
cron.orphan [t|0]
hardware.model [t|PowerEdge 1855]
mem.free [t|21024]
swap.free [t|8385600]
hardware.serial [t|3DF0C2J]
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
ERROR: User name does not exist.
********* simple selection ********* ********* selection by list *********
-A all processes -C by command name
-N negate selection -G by real group ID (supports names)
-a all w/ tty except session leaders -U by real user ID (supports names)
-d all except session leaders -g by session OR by effective group name
-e all processes -p by process ID
T all processes on this terminal -s processes in the sessions given
a all w/ tty, including other users -t by tty
g OBSOLETE -- DO NOT USE -u by effective user ID (supports names)
r only running processes U processes for specified users
x processes w/o controlling ttys t by tty
*********** output format ********** *********** long options ***********
-o,o user-defined -f full --Group --User --pid --cols --ppid
-j,j job control s signal --group --user --sid --rows --info
-O,O preloaded -o v virtual memory --cumulative --format --deselect
-l,l long u user-oriented --sort --tty --forest --version
-F extra full X registers --heading --no-heading --context
********* misc options *********
-V,V show version L list format codes f ASCII art forest
-m,m,-L,-T,H threads S children in sum -y change -l format
-M,Z security data c true command name -c scheduling class
-w,w wide output n numeric WCHAN,UID -H process hierarchy
ps.mem[/bin/ps -u -o pid,args | /bin/grep -i | /bin/grep -v grep | /bin/awk '{print $ 1}' | /usr/bin/xargs ps -o rss --noheaders] [t|1632
LASTLY:
I am running rhel4 update 4 on the problematic hosts. rhel 4 update 3 is on all the other hosts. Has anyone seen any problems with the new rhel update?
Thank you,
Dennis
Comment