Ad Widget

**AdamLundrigan** · 30-01-2008, 17:01

truss'd

OK. I revised my strategy slightly, and turned to truss for better stack tracing.

I disabled the lone host in my test system, restarted the server, waited to make sure it would keep running (it did). Then I attached an instance of truss to each zabbix_server process, and dumped the output to files. When I enabled the lone monitored host again, the server crapped out - as per my expectations.

Here is the truncated zabbix_server.log:

Code:

 28701:20080130:110554 End process_httptests()
 28701:20080130:110554 Spent 0 seconds while processing HTTP tests
 28701:20080130:110554 Query [select count(*),min(nextcheck) from httptest t where t.status=0 and mod(t.httptestid,5)=3 and  t.httptestid>=100000000000000*0 and t.httptestid<=(100000000000000*0+99999999999999) ]
 28701:20080130:110554 No httptests to process in get_minnextcheck.
 28701:20080130:110554 Nextcheck:-1 Time:1201703754
 28701:20080130:110554 Sleeping for 5 seconds
 28686:20080130:110554 End update_functions()
 28682:20080130:110554 In evaluate_expression({12088}=0)
 28686:20080130:110554 In update_triggers [itemid:19824]
 28682:20080130:110554 In substitute_simple_macros()
 28686:20080130:110554 Query [select distinct t.triggerid,t.expression,t.description,t.url,t.comments,t.status,t.value,t.priority from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=19824]
 28682:20080130:110554 In substitute_simple_macros (data:{12088}=0)
 28681:20080130:110554 One child process died. Exiting ...
 28683:20080130:110554 Got signal. Exiting ...
    ....
 28694:20080130:110554 Got signal. Exiting ...
 28681:20080130:110557 ZABBIX Server stopped

A snippet of the interesting bit of truss output from the parent process:

Code:

    Received signal #18, SIGCLD, in nanosleep() [caught]
      siginfo: SIGCLD CLD_KILLED pid=28682 status=0x000B
nanosleep(0xFFBFF318, 0xFFBFF310)               Err#4 EINTR
sigprocmask(SIG_SETMASK, 0xFFBFEF14, 0x00000000) = 0
open("/tmp/zabbix_server.log", O_RDWR|O_APPEND|O_CREAT, 0666) = 7
time()                                          = 1201703754
getpid()                                        = 28681 [1]

It says pid# 28682 triggered the signal by being killed. Here is the snippet of truss output from pid# 28682:

Code:

close(7)                                        = 0
stat("/tmp/zabbix_server.log", 0xFFBED4A0)      = 0
open("/tmp/zabbix_server.log", O_RDWR|O_APPEND|O_CREAT, 0666) = 7
time()                                          = 1201703754
getpid()                                        = 28682 [28681]
fstat64(7, 0xFFBEBEA0)                          = 0
fstat64(7, 0xFFBEBD48)                          = 0
ioctl(7, TCGETA, 0xFFBEBE2C)                    Err#25 ENOTTY
    Incurred fault #6, FLTBOUNDS  %pc = 0xFEE34694
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [default]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000

According to this article on Sun.com, SEGV_MAPERR generally denotes a stack overflow.

Oh dear...

**Alexei** · 30-01-2008, 18:06

I looks very much like a problem we fixed recently in pre 1.4.5 code. It was related to incorrect processing of zero length strings for trigger functions str(), regexp() and iregexp().

May I ask you to try the latest code? Get it by executing:

svn checkout svn:/svn.zabbix.com/branches/1.4 1.4

Thank you.

**AdamLundrigan** · 31-01-2008, 20:25

No joy with the svn version. Same truss output as before:

Code:

open("/tmp/zabbix_server.log", O_RDWR|O_APPEND|O_CREAT, 0666) = 6
time()                                          = 1201803212
getpid()                                        = 11466 [11465]
fstat64(6, 0xFFBEBE90)                          = 0
fstat64(6, 0xFFBEBD38)                          = 0
ioctl(6, TCGETA, 0xFFBEBE1C)                    Err#25 ENOTTY
    Incurred fault #6, FLTBOUNDS  %pc = 0xFEE34694
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [default]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000

I wiped out the Zabbix 1.4.4 installation on both the client and server, built the version checked out from SVN, and used that to install the client and server.

I have a spare machine kicking around of modest capabilities (AMD64 3500+) and a Ubuntu Server 7.10 CD. I will combine the two, install zabbix, and report back how that goes.

Here are the logs (zabbix_server and truss): ftp://ocgftp.nfl.dfo-mpo.gc.ca/outgo...abbix_logs.tgz

**twydyn** · 27-02-2008, 20:43

I am having the exact same problem mentioned above. I have tried the latest version as well with no luck. What are our options at this point?

**marcelein** · 19-06-2008, 12:58

same errors

Code:

 17024:20080614:000033 Requested [vfs.fs.size[/var,free]]
 17024:20080614:000033 Sending back [130952196]
 17022:20080614:000033 Processing request.
 17022:20080614:000033 In check_security()
 17022:20080614:000033 Requested [net.if.out[eth1,bytes]]
 17022:20080614:000033 Sending back [16636352054]
 17023:20080614:000033 Processing request.
 17023:20080614:000033 In check_security()
 17023:20080614:000033 Requested [system.cpu.load[,avg5]]
 17023:20080614:000033 Sending back [1.090000]
 17021:20080614:000037 Got signal. Exiting ...
 17023:20080614:000037 Got signal. Exiting ...
 17024:20080614:000037 Got signal. Exiting ...
 17025:20080614:000037 Got signal. Exiting ...
 17022:20080614:000037 Got signal. Exiting ...
 17020:20080614:000037 Got signal. Exiting ...
 17020:20080614:000037 zbx_on_exit() called.
 17020:20080614:000039 ZABBIX Agent stopped

thats what i got every day to the same time 00:00:39, its a vserver
the other servers with a zabbix agentd running doesnt get this error and are running stable

im really looking forward to a solution or a new stable version of zabbix?

Ad Widget

zabbix_server crashes on host addition

zabbix_server crashes on host addition

Comment

Comment

Comment

Comment

Comment