We have found that one of the zabbix_server processes will go from somewhere near 4% CPU utilization to 99% CPU utilization. While this is a concern, we ignored this situation until we noticed that the system stops collecting some of our Host Item data. Many of the Host Item data continues, but some just stop. This is most evident when we look at a Last Week Graph of something like temperature.
I have pasted information about the zabbix_server version, top, strace, and zabbix_server.log below. Any help is greatly appreciated!
Here is the output from zabbix_server --version:
administrator@MONITORING1:~$ zabbix_server --version
ZABBIX Server (daemon) v1.4.5 (25 March 2008)
Compilation time: Apr 21 2008 15:49:43
Here is the output from the Linux top command:
top - 06:00:32 up 2 days, 8:38, 1 user, load average: 2.24, 2.03, 2.05
Tasks: 98 total, 4 running, 94 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.8%us, 41.8%sy, 16.5%ni, 17.2%id, 18.5%wa, 1.5%hi, 0.7%si, 0.0%st
Mem: 2075908k total, 2021856k used, 54052k free, 126684k buffers
Swap: 2947888k total, 96k used, 2947792k free, 1686916k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2068 zabbix 30 5 10576 2968 1852 R 98 0.1 461:00.16 zabbix_server
3970 mysql 18 0 135m 47m 5328 S 19 2.4 694:52.93 mysqld
2052 zabbix 20 5 12568 5176 1984 S 1 0.2 3:03.50 zabbix_server
2053 zabbix 20 5 12496 5108 1984 S 1 0.2 2:48.90 zabbix_server
2050 zabbix 22 5 12496 5120 1980 R 1 0.2 2:55.31 zabbix_server
2054 zabbix 20 5 9904 1560 852 S 1 0.1 1:00.41 zabbix_server
2049 zabbix 20 5 12568 5196 1984 S 0 0.3 3:13.47 zabbix_server
2051 zabbix 20 5 12640 5256 1984 S 0 0.3 3:16.95 zabbix_server
2056 zabbix 20 5 9904 1560 852 S 0 0.1 1:01.65 zabbix_server
2058 zabbix 20 5 9904 1560 852 S 0 0.1 1:01.30 zabbix_server
2076 zabbix 20 5 10008 2612 1820 S 0 0.1 0:08.03 zabbix_server
4198 zabbix 20 5 4392 872 604 S 0 0.0 3:50.86 zabbix_agentd
1 root 18 0 2948 1852 532 S 0 0.1 0:04.31 init
Here is a sample of strace -p 2068 (the PID listed above):
gettimeofday({1209899083, 753014}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 753969}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 754993}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 755837}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 756681}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
Here is a sample of zabbix_server.log:
administrator@MONITORING1:~$ tail /tmp/zabbix_server.log
2050:20080504:060636 Expression [{18698}>0] cannot be evaluated [Unable to get value for functionid [18698]]
2050:20080504:060636 Expression [{15579}>0] cannot be evaluated [Unable to get value for functionid [15579]]
2050:20080504:060636 Expression [{12566}>0] cannot be evaluated [Unable to get value for functionid [12566]]
2051:20080504:060637 Expression [{12387}>0] cannot be evaluated [Unable to get value for functionid [12387]]
2051:20080504:060637 Expression [{15400}>100] cannot be evaluated [Unable to get value for functionid [15400]]
2051:20080504:060637 Expression [{12567}>150000] cannot be evaluated [Unable to get value for functionid [12567]]
2051:20080504:060637 Expression [{15580}>0] cannot be evaluated [Unable to get value for functionid [15580]]
2051:20080504:060637 Expression [{13227}>0] cannot be evaluated [Unable to get value for functionid [13227]]
2051:20080504:060638 Expression [{18642}>100] cannot be evaluated [Unable to get value for functionid [18642]]
2051:20080504:060638 Expression [{13407}>150000] cannot be evaluated [Unable to get value for functionid [13407]]
I have pasted information about the zabbix_server version, top, strace, and zabbix_server.log below. Any help is greatly appreciated!
Here is the output from zabbix_server --version:
administrator@MONITORING1:~$ zabbix_server --version
ZABBIX Server (daemon) v1.4.5 (25 March 2008)
Compilation time: Apr 21 2008 15:49:43
Here is the output from the Linux top command:
top - 06:00:32 up 2 days, 8:38, 1 user, load average: 2.24, 2.03, 2.05
Tasks: 98 total, 4 running, 94 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.8%us, 41.8%sy, 16.5%ni, 17.2%id, 18.5%wa, 1.5%hi, 0.7%si, 0.0%st
Mem: 2075908k total, 2021856k used, 54052k free, 126684k buffers
Swap: 2947888k total, 96k used, 2947792k free, 1686916k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2068 zabbix 30 5 10576 2968 1852 R 98 0.1 461:00.16 zabbix_server
3970 mysql 18 0 135m 47m 5328 S 19 2.4 694:52.93 mysqld
2052 zabbix 20 5 12568 5176 1984 S 1 0.2 3:03.50 zabbix_server
2053 zabbix 20 5 12496 5108 1984 S 1 0.2 2:48.90 zabbix_server
2050 zabbix 22 5 12496 5120 1980 R 1 0.2 2:55.31 zabbix_server
2054 zabbix 20 5 9904 1560 852 S 1 0.1 1:00.41 zabbix_server
2049 zabbix 20 5 12568 5196 1984 S 0 0.3 3:13.47 zabbix_server
2051 zabbix 20 5 12640 5256 1984 S 0 0.3 3:16.95 zabbix_server
2056 zabbix 20 5 9904 1560 852 S 0 0.1 1:01.65 zabbix_server
2058 zabbix 20 5 9904 1560 852 S 0 0.1 1:01.30 zabbix_server
2076 zabbix 20 5 10008 2612 1820 S 0 0.1 0:08.03 zabbix_server
4198 zabbix 20 5 4392 872 604 S 0 0.0 3:50.86 zabbix_agentd
1 root 18 0 2948 1852 532 S 0 0.1 0:04.31 init
Here is a sample of strace -p 2068 (the PID listed above):
gettimeofday({1209899083, 753014}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 753969}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 754993}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 755837}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
gettimeofday({1209899083, 756681}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
recvmsg(0, 0xbfe58bc8, 0) = -1 ENOTSOCK (Socket operation on non-socket)
Here is a sample of zabbix_server.log:
administrator@MONITORING1:~$ tail /tmp/zabbix_server.log
2050:20080504:060636 Expression [{18698}>0] cannot be evaluated [Unable to get value for functionid [18698]]
2050:20080504:060636 Expression [{15579}>0] cannot be evaluated [Unable to get value for functionid [15579]]
2050:20080504:060636 Expression [{12566}>0] cannot be evaluated [Unable to get value for functionid [12566]]
2051:20080504:060637 Expression [{12387}>0] cannot be evaluated [Unable to get value for functionid [12387]]
2051:20080504:060637 Expression [{15400}>100] cannot be evaluated [Unable to get value for functionid [15400]]
2051:20080504:060637 Expression [{12567}>150000] cannot be evaluated [Unable to get value for functionid [12567]]
2051:20080504:060637 Expression [{15580}>0] cannot be evaluated [Unable to get value for functionid [15580]]
2051:20080504:060637 Expression [{13227}>0] cannot be evaluated [Unable to get value for functionid [13227]]
2051:20080504:060638 Expression [{18642}>100] cannot be evaluated [Unable to get value for functionid [18642]]
2051:20080504:060638 Expression [{13407}>150000] cannot be evaluated [Unable to get value for functionid [13407]]
Comment