PDA

View Full Version : agent active check losing


Dmitriy Kirhlarov
06-02-2008, 10:51
I use zabbix agent active checks for get disk space information (parsing 'df' output).
Problem is -- one volume periodicaly get "Unsupported" state.

zabbix_agentd.conf:
UserParameter=vfs.fs.free ,df | sed -nE 's@^([^ ]+)([ ]+)([^ ]+)([ ]+)([^ ]+)([ ]+)([^ ]+)([ ]+)([^ ]+)([ ]+)$1$@\7@p'

zabbix_agent.log:
66479:20080206:091646 For key [vfs.fs.size[/data,free]] received value [295453184]
66479:20080206:091646 XML before sending [<req><host>aW5mcmEubWdtdC52ZWdhLnJ1</host><key>dmZzLmZzLnNpemVbL2RhdGEsZnJlZV0=</key><data>Mjk1NDUzMTg0</data></req>]
66479:20080206:091646 OK
66479:20080206:091646 In get_min_nextcheck()
66479:20080206:091646 Sleeping for 57 seconds
66479:20080206:091743 In refresh_metrics('10.25.0.254',10051)
66479:20080206:091743 get_active_checks('10.25.0.254',10051)
66479:20080206:091743 Sending [ZBX_GET_ACTIVE_CHECKS
infra.mgmt.vega.ru
]
66479:20080206:091743 Before read

zabbix_server.log:
73203:20080206:091653 Active parameter [vfs.fs.size[/data,free]] is not supported by agent on host [infra.mgmt.vega.ru]
73205:20080206:091653 Active parameter [vfs.fs.size[/data,used]] is not supported by agent on host [infra.mgmt.vega.ru]

Last value of "vfs.fs.size[/data,free]" on server (I use UTC timezone on servers and MSK on my workstation):
2008-02-06 12:16:46 1202289406 302544060416

I.e. -- zabbix_server get correct value from agent, add it to database and, after that, mark item as "unsupported".

My system:
infra# uname -rs; pkg_info -Ix zabbix; pkg_info -Ix postgres
FreeBSD 7.0-20071107-SNAP
zabbix-1.4.4,1 Application and network monitoring solution
zabbix-agent-1.4.4,1 Application and network monitoring solution
postgresql-client-8.2.5_1 PostgreSQL database (client)
postgresql-server-8.2.5_2 The most advanced open-source database available anywhere

I need zabbix and can give any extended information.

Alexei
06-02-2008, 11:40
Perhaps value type of the item is not compatible with the received values?

Dmitriy Kirhlarov
06-02-2008, 12:28
Numeric (integer 64bit).
Correctly work for others volumes and for volume '/data' as 'ZABBIX agent' (not 'ZABBIX agent (active)')

Dmitriy Kirhlarov
06-02-2008, 14:26
same behaviour with attached log item.

zabbix_agentd.log:
75136:20080206:130943 In process log (/var/log/rsnapshot.log,96)
75136:20080206:130943 In get_min_nextcheck()
75136:20080206:130943 Sleeping for 9 seconds
75136:20080206:130952 In refresh_metrics('10.25.0.254',10051)
75136:20080206:130952 get_active_checks('10.25.0.254',10051)
75136:20080206:130952 Sending [ZBX_GET_ACTIVE_CHECKS
infra.mgmt.vega.ru
]
75136:20080206:130952 Before read
75136:20080206:130952 In parse_list_of_checks() [vfs.dev.gmirror:600:0
log[/var/log/maillog,SYSERR]:10:0
vfs.dev.zpool:600:0
system.uptime:60:0
ZBX_EOF
]
75136:20080206:130952 In disable_all_metrics()

zabbix_server.log:
73205:20080206:130818 Active parameter [log[/var/log/rsnapshot.log,ERROR]] is not supported by agent on host [infra.mgmt.vega.ru]

Alexei
06-02-2008, 14:32
Problem is -- one volume periodicaly get "Unsupported" state.
Well, this means that the user parameter periodically fails. It could be because of timeouts.

Dmitriy Kirhlarov
06-02-2008, 14:41
It's strange, because zabbix_server and zabbix_agent work on same host, and CPU Idle ~ 85.

Alexei
06-02-2008, 14:44
I see nothing strange here. The df command can be slow!

Dmitriy Kirhlarov
06-02-2008, 15:05
I add Timeout=10 to zabbix_agentd.conf
It doesn't help.

Any other hints?

Dmitriy Kirhlarov
07-02-2008, 13:13
After enabling log monitoring, two times I get situation, when zabbix_server eat 100% CPU. After restart zabbix_server CPU Idle return to normal ~85%

What I did:
1. looks to top
2. ps -auxww | grep $PID
zabbix 86125 86.9 0.1 28028 1420 ?? RN 7:21PM 827:46.87 zabbix_server: processing data (zabbix_server)

Dmitriy Kirhlarov
11-02-2008, 19:44
I add Timeout=10 to zabbix_agentd.conf
It doesn't help.

Any other hints?

Issue fixed now.
Stupid error -- several hosts has identical hostname in zabbix_agentd.conf

Alexei
11-02-2008, 20:35
Thanks for the follow up!