PDA

View Full Version : ugly softirq spikes


dkovach
05-10-2005, 00:43
every time i start zabbix_agentd on any of my web servers the system's softirq rises sharply (from 0.8-2.8 to 78-99) dragging the system load along with it. normally the systems run at about 1.5...lately when this happens the system load rises to 9 or 10

i notice this *doesn't* happen on our mysqld servers which are monitoring a different set of data

anyone else seen this behavior and have any idea where i should start looking?

-dave

Nate Bell
05-10-2005, 21:11
What OS are you using, and what version of Zabbix are you running?

If I'm reading your post right, it sounds like you are saying the servers hosting web pages are getting bogged down when you start zabbix_agentd, but this is not happening on servers that aren't running a web server? If this is the case, then it sounds like there is a conflict between your webserver (apache I presume?) and your copy of Zabbix.

Aside from stating the obvious, I don't have any ideas. I just wanted to get things started by clarifying your question.

Nate

dkovach
05-10-2005, 22:30
sorry for the lack of info

[root@web2 root]# cat /etc/redhat-release
Red Hat Enterprise Linux ES release 3 (Taroon Update 4)

[root@web2 root]# /webserver/bin/httpd -v
Server version: Apache/1.3.33 (Unix)
Server built: Apr 1 2005 12:07:21

the zabbix server is currently 1.1alpha7, however i tried updating the zabbix_agent installs to 1.1beta1 which didn't seem to change things

the odd thing is that we've been running this instance of zabbix for some time without a problem, nothing significant appears to have changed on either our server or our agent boxes.

my next step is probably going to be upgrading the server to 1.1beta1 and, if that doesn't help, downgrading the whole operation to 1.0

the following is the last bit of the /tmp/zabbix_agentd.log showing everything for a complete session from agent start to kill -9 because my system load is instantly at 10:

002667:20051005:161549 zabbix_agentd started. ZABBIX 1.1beta1.
002668:20051005:161549 zabbix_agentd 2668 started
002669:20051005:161549 zabbix_agentd 2669 started
002670:20051005:161549 zabbix_agentd 2670 started
002671:20051005:161549 zabbix_agentd 2671 started
002672:20051005:161549 zabbix_agentd 2672 started
002672:20051005:161549 Cannot connect to [10.235.37.143:10051] [Connection refused]
002672:20051005:161549 Getting list of active checks failed. Will retry after 60 seconds
002668:20051005:161606 Timeout while answering request
002670:20051005:161608 Timeout while answering request
002668:20051005:161611 Error writing to socket [Connection reset by peer]
002670:20051005:161614 Error writing to socket [Connection reset by peer]

dkovach
06-10-2005, 00:31
looks like upgrading everything to 1.1beta1 fixed things.

dkovach
19-10-2005, 00:19
so...i upgraded apache on my web servers from 1.3.33 to 1.3.34 and now my friend the softirq problem is back. see above for other relevant info, nothing has changed outside of the apache version.

any suggestions?