PDA

View Full Version : Zabbix b5 crashed


bbrendon
05-01-2006, 19:28
Zabbix b5 went ape shit on me this morning. Between 8:37 AM and 8:44 Zabbix triggered all nodata() triggers and everyone's pager went off. In addition, the CPU utilization dropped at the same time. It appears zabbix stopped allowing connections into it.

It appears a bunch of zabbix_server processes are missing. After restarting the server, I have 11 zabbix_server processes. Below I see 4.

I also saved the log files if anyone is interested. I don't understand the date format of them though.

-bb

# telnet localhost 10051
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused


root 10969 0.0 0.2 2388 1064 ? Ss Jan02 0:01 /bin/sh /usr/bin/zabbix_check_server 3 120 root
zabbix 20690 0.1 0.1 2516 1012 ? SN Jan02 4:23 /usr/bin/zabbix_agentd
zabbix 20691 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
zabbix 20692 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
zabbix 20693 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
zabbix 20694 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
zabbix 20695 0.0 0.1 2516 940 ? SN Jan02 0:23 /usr/bin/zabbix_agentd
zabbix 26455 0.0 0.3 5324 1676 ? S 08:30 0:00 /usr/bin/zabbix_server
zabbix 26457 0.1 0.3 5324 1780 ? S 08:30 0:10 /usr/bin/zabbix_server
zabbix 26459 0.0 0.3 5324 1752 ? S 08:30 0:05 /usr/bin/zabbix_server
zabbix 26461 0.0 0.3 5324 1608 ? S 08:30 0:00 /usr/bin/zabbix_server


##zabbix_server.conf:
Server=1
StartSuckers=6
StartTrappers=5

ListenPort=10051
HousekeepingFrequency=1
SenderFrequency=30

DebugLevel=4
Timeout=5
PidFile=/var/run/zabbix/server.pid
LogFile=/var/log/zabbix/server.log
AlertScriptsPath=/home/zabbix/bin/
FpingLocation=/usr/sbin/fping

bbrendon
12-01-2006, 18:41
This happened again this morning. Has this happened to anyone else?

# ps aux|grep zab
root 1634 0.0 0.2 2392 1068 ? Ss Jan05 0:02 /bin/sh /usr/bin/zabbix_check_server 3 120 root
zabbix 13251 0.1 0.1 2516 1004 ? SN Jan09 4:01 /usr/bin/zabbix_agentd
zabbix 13252 0.0 0.1 2516 712 ? SN Jan09 0:00 /usr/bin/zabbix_agentd
zabbix 13253 0.0 0.1 2516 712 ? SN Jan09 0:00 /usr/bin/zabbix_agentd
zabbix 13254 0.0 0.1 2516 712 ? SN Jan09 0:00 /usr/bin/zabbix_agentd
zabbix 13255 0.0 0.1 2516 712 ? SN Jan09 0:00 /usr/bin/zabbix_agentd
zabbix 13256 0.0 0.1 2516 940 ? SN Jan09 0:25 /usr/bin/zabbix_agentd
zabbix 3635 0.0 0.3 5328 1680 ? S 06:26 0:00 /usr/bin/zabbix_server
zabbix 3637 0.1 0.3 5328 1780 ? S 06:26 0:19 /usr/bin/zabbix_server
zabbix 3639 0.1 0.3 5328 1756 ? S 06:26 0:11 /usr/bin/zabbix_server
zabbix 3641 0.0 0.3 5328 1620 ? S 06:26 0:01 /usr/bin/zabbix_server
root 16083 0.0 0.1 1640 532 pts/10 S+ 09:38 0:00 grep zab

Alexei
12-01-2006, 18:46
Anything in log file?

bbrendon
13-01-2006, 09:46
i saved the logfile the first time it happened. before i restarted the server here is tail -100

What is the date format? is it:
PID:date:time ?

026457:20060105:101916 In evaluate_expression({11553})
026457:20060105:101916 Before deleting spaces:{11553}
026457:20060105:101916 After deleting spaces:{11553}
026457:20060105:101916 BEGIN substitute_functions ({11553})
026457:20060105:101916 Before find_char:{11553}[{]
026457:20060105:101916 Before find_char:{11553}[{]
026457:20060105:101916 Before find_char:{11553}[}]
026457:20060105:101916 Executing query:select 0,lastvalue from functions where functionid=11553
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [1]
026457:20060105:101916 Expression1:[{11553}]
026457:20060105:101916 Expression2:[%lf553}]
026457:20060105:101916 Expression3:[%lf ]
026457:20060105:101916 Before deleting spaces:1.000000
026457:20060105:101916 After deleting spaces:1.000000
026457:20060105:101916 Expression4:[1.000000]
026457:20060105:101916 Before find_char:1.000000[{]
026457:20060105:101916 Expression:[1.000000]
026457:20060105:101916 END substitute_functions
026457:20060105:101916 In evaluate([1.000000])
026457:20060105:101916 Before find_char:1.000000[)]
026457:20060105:101916 Evaluating simple expression [1.000000]
026457:20060105:101916 Evaluate end:[1.000000]
026457:20060105:101916 exp_value trigger.value trigger.prevvalue [1] [1] [134673824]
026457:20060105:101916 In update_trigger_value[12521,1,1136485156]
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [1]
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [12]
026457:20060105:101916 In update_functions(19119)
026457:20060105:101916 Executing query:select function,parameter,itemid,lastvalue from functions where itemid=19119 group by 1,2,3 order by 1,2,3
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [2]
026457:20060105:101916 ItemId:19119 Evaluating last(134732137)

026457:20060105:101916 In evaluate_FUNCTION() Function [last] flag [0]
026457:20060105:101916 In evaluate_FUNCTION() 1
026457:20060105:101916 In evaluate_FUNCTION() 2 value [0.016667]
026457:20060105:101916 In evaluate_FUNCTION() pre-7
026457:20060105:101916 In evaluate_FUNCTION() 7 Formula [0]
026457:20060105:101916 In evaluate_FUNCTION() 7 Value [0.016667]
026457:20060105:101916 In evaluate_FUNCTION() 7 Units []
026457:20060105:101916 In evaluate_FUNCTION() 7 Value [0.016667] Units [] Formula [0]
026457:20060105:101916 End of evaluate_FUNCTION. Result [0.016667]
026457:20060105:101916 Result of evaluate_FUNCTION [0.016667]

026457:20060105:101916 Do not update functions, same value
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [2]
026457:20060105:101916 ItemId:19119 Evaluating nodata(134732195)

026457:20060105:101916 In evaluate_FUNCTION() Function [nodata] flag [0]
026457:20060105:101916 In evaluate_FUNCTION() pre-7
026457:20060105:101916 In evaluate_FUNCTION() 7 Formula [0]
026457:20060105:101916 In evaluate_FUNCTION() 7 Value [1]
026457:20060105:101916 In evaluate_FUNCTION() 7 Units []
026457:20060105:101916 In evaluate_FUNCTION() 7 Value [1] Units [] Formula [0]
026457:20060105:101916 End of evaluate_FUNCTION. Result [1]
026457:20060105:101916 Result of evaluate_FUNCTION [1]

026457:20060105:101916 Do not update functions, same value
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [2]
026457:20060105:101916 In update_triggers [19119]
026457:20060105:101916 Executing query:select distinct t.triggerid,t.expression,t.status,t.dep_level,t.pr iority,t.value,t.description from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=19119
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [1]
026457:20060105:101916 In evaluate_expression({11568})
026457:20060105:101916 Before deleting spaces:{11568}
026457:20060105:101916 After deleting spaces:{11568}
026457:20060105:101916 BEGIN substitute_functions ({11568})
026457:20060105:101916 Before find_char:{11568}[{]
026457:20060105:101916 Before find_char:{11568}[{]
026457:20060105:101916 Before find_char:{11568}[}]
026457:20060105:101916 Executing query:select 0,lastvalue from functions where functionid=11568
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [1]
026457:20060105:101916 Expression1:[{11568}]
026457:20060105:101916 Expression2:[%lf568}]
026457:20060105:101916 Expression3:[%lf ]
026457:20060105:101916 Before deleting spaces:1.000000
026457:20060105:101916 After deleting spaces:1.000000
026457:20060105:101916 Expression4:[1.000000]
026457:20060105:101916 Before find_char:1.000000[{]
026457:20060105:101916 Expression:[1.000000]
026457:20060105:101916 END substitute_functions
026457:20060105:101916 In evaluate([1.000000])
026457:20060105:101916 Before find_char:1.000000[)]
026457:20060105:101916 Evaluating simple expression [1.000000]
026457:20060105:101916 Evaluate end:[1.000000]
026457:20060105:101916 exp_value trigger.value trigger.prevvalue [1] [1] [134673824]
026457:20060105:101916 In update_trigger_value[12535,1,1136485156]
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [1]
026457:20060105:101916 In DBnum_rows
026457:20060105:101916 Result of DBnum_rows [12]
026455:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
026457:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
026459:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
026461:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...

Alexei
13-01-2006, 12:11
May I ask you to send full Log file to alex@zabbix.com ? Many thanks.

bbrendon
02-02-2006, 22:30
Did this ever come to anything? Was a bug found? Am I the only one experiencing this?

Shiva
03-02-2006, 09:40
Hi
I had a similar problem. This bug is probably fixed in CVS already. So i guess we will have to wait for beta6...

azilber
03-02-2006, 17:41
I've been experiecing random crashes as well. The services just dies with nothin g in the logs. If anyone else is expereincing this, I'm using monit in the interim to handle this problem:

http://www.tildeslash.com/monit/download/

My monit.conf additions for zabbix_server (for RedHat AS 4):

check process zabbix_server with pidfile /var/run/zabbix/zabbix_server.pid
start program = "/etc/init.d/zabbix_server start"
stop program = "/etc/init.d/zabbix_server stop"
group server

bbrendon
11-02-2006, 00:55
I'm still experiencing crashes here with beta 6. Now when doing "ps aux" after a crash, there are NO zabbix processes left.

elkor
11-02-2006, 03:59
I have not experienced this bb,

I'm due to upgrade to the latest code (current agent v1.1b1)
is there ANYTHING else you can give us to track this down.. it looks like the parent process is killing all the others.

OS/version/checks/frequency

you really cant be too verbose here ;)

bbrendon
11-02-2006, 08:16
As for our environment, we're using debian mysql 4.1.11-3.

All agents are active checks becuase the servers are behind firewalls. 3x unix systems, and about 15 windows servers. There are a few hosts with simple checks for availability and latency. We're using the debian package of zabbix. Thats pretty much it.

Let me know if I can provide more detail or what you're looking for specifically.

Alexei
11-02-2006, 10:17
Yes, please, provide me with detailed log file of ZABBIX server.

bbrendon
11-02-2006, 18:14
I emailed it to Alexei. It won't fit in this posting. Its a few Kb over the limit.