View Full Version : 100% CPU utilization by zabix_agentd
Hi,
Has anybody spotted it before ?
Linux kernel 2.6.17 SMP (FC5).
Agent Version 1.1.6 (same with 1.1.5).
I heve got 4 identical machines (hardware spec) with 2 dual core CPUs.
Only on one of them zabbix_agentd behaves like that.
Any ideas ?
And general question:
How big can the maximum impact of running zabbix agent be ?
I have couple of Postgres DB systems that are already havily overloaded sometimes and am affraid to plant zabbix agent on them.
I have never even seen (heard of) this before! ZABBIX agent use native system calls of OS thus it requires absolute minimum of CPU/memory resources. Normally, depending on number and frequency of checks, it requires much less than 1% of CPU.
Well, just took a closer look at that.
Is it a "top's" bug or I do not understand something ?
And one more correction - kernel 2.6.16.
But my senior admin already killed one of my agents when he spotted 25% CPU
util. by the agent itself and avarage load above 6 (typical is never highier than 1.5)
And that was on 2.6.17.
I felt in love with zabbix and nobody likes watching his/her lover being killed ;-)
Shall I attach strace to the process ?
Any other tests ?
top - 11:19:07 up 138 days, 15:48, 4 users, load average: 1.03, 1.21, 1.24
Tasks: 146 total, 2 running, 144 sleeping, 0 stopped, 0 zombie
Cpu0 : 1.7% us, 1.7% sy, 0.0% ni, 96.7% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st
Cpu1 : 0.0% us, 48.8% sy, 41.2% ni, 0.0% id, 0.0% wa, 0.0% hi, 10.0% si, 0.0% st
Cpu2 : 1.3% us, 0.7% sy, 0.0% ni, 97.7% id, 0.0% wa, 0.3% hi, 0.0% si, 0.0% st
Cpu3 : 1.0% us, 0.3% sy, 0.0% ni, 94.7% id, 4.0% wa, 0.0% hi, 0.0% si, 0.0% st
Mem: 2070536k total, 1948276k used, 122260k free, 243896k buffers
Swap: 8418052k total, 68k used, 8417984k free, 733964k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30063 zabbix 30 5 2464 512 364 R 100 0.0 6826:02 zabbix_agentd
21918 nagios 16 0 28828 3684 1144 S 1 0.2 1:14.85 nagios
10376 cactiuse 16 0 21808 11m 4828 S 1 0.6 0:01.19 poller.php
12697 root 16 0 6344 2084 1704 S 1 0.1 0:00.02 sshd
12731 root 16 0 6320 1912 1568 S 1 0.1 0:00.02 sshd
Yes, it would be very nice if you could post strace of the ZABBIX agent eating 100% CPU. I'm looking forward to seeing what he is doing :)
I have funny feeling that it is somehow related to nagios being run for a long
time.
Since I restarted nagios to reread config changes I can not replicate the problem.
On my previous screenshot nagios was the second most CPU time consuming process.
I'll let you know as soon as I spot it again and dump some strace output.
By the way can I use 1.3.2 agent with 1.1.6 server ?
Will 1.4 be released this month ?
Regards
Peter
I have the same problem since i tried to restart zabbix_agentd (1.4) after small changes in the config file today. Now the agent eats up all the cpu ressources, while not sending data to the server (which is running on the same machine) or writing anything in the logfile.
strace -p 29514 -s 100:
Process 29514 attached - interrupt to quit
write(2, "zabbix_agentd [29514]: ", 23) = -1 EPIPE (Broken pipe)
write(2, "Warning: Got SIGPIPE. Where it came from???", 43) = -1 EPIPE (Broken pipe)
write(2, "\n", 1) = -1 EPIPE (Broken pipe)
rt_sigreturn(0x51ab50) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
Any ideas?
We found a theoretical and thus possible in real life case when ZABBIX agent can eat 100% AFTER SATRTUP ONLY. This is related to handling of IPC resources. Fixed in pre 1.4.2. I would be very interested in your experience after 1.4.2 is relelased.
Problem solved.
Somehow the zabbix_agentd logfile got write-protected which resulted in 100% cpu use of zabbix_agentd. I seems like it tried to write to this file all the time with no success...
Maybe you can change this behaviour in the next versions.
cheers
Problem solved.
Somehow the zabbix_agentd logfile got write-protected which resulted in 100% cpu use of zabbix_agentd. I seems like it tried to write to this file all the time with no success...
Maybe you can change this behaviour in the next versions.
cheers
Thanks for reporting this problem. It is fixed in pre 1.4.2 available from http://www.zabbix.com/developers.php.