Ad Widget

**tronite** · 08-06-2007, 18:14

Originally posted by agehring

Are there any patches for the "node watcher" thread for server?

I'm getting segment violations (SIGSEGV SIG_DFL) on zabbix server running on OS X, and it always happens with the node watcher thread.

Is there a way to turn of the distributed attribute during run and/or compile time?

Thanks,
Andrew

P.S. Working on debugging it "deeper"...

Have you checked what they say here: http://sciss.de/jcollider/doc/api/de...deWatcher.html
?

**agehring** · 08-06-2007, 18:19

What does a Java class have to do with zabbix?

**agehring** · 08-06-2007, 19:14

More Info...

I'm getting "deeper".

The node watcher server SIGSEGVs at the 103942nd call of

In send_to_master_and_slave(node:0)

of the node watcher...

I've repeated this 10 times, without fail...

memory leak?

Andrew

**Alexei** · 08-06-2007, 19:36

Please could you apply this patch and let me know if it helped.

Attached Files

dm.patch (457 Bytes, 260 views)

**agehring** · 08-06-2007, 21:55

I appled the patch, and the only change is that nw it process 103941 calls, vs 103942..

This is the log output just before the crash dump...

19307:20070608:134636 In get_master_node(0)
19307:20070608:134636 Query [select masterid from nodes where nodeid=0]
19307:20070608:134636 In get_slave_node(0)
19307:20070608:134636 Query [select masterid from nodes where nodeid=0]
19307:20070608:134636 Query [select nodeid from nodes where masterid=0]
19307:20070608:134636 In process_node(node:0)
19307:20070608:134636 In send_to_master_and_slave(node:0)
19292:20070608:134637 One child process died. Exiting ...
19293:20070608:134637 Got signal. Exiting ...
19294:20070608:134637 Got signal. Exiting ...
19295:20070608:134637 Got signal. Exiting ...
19296:20070608:134637 Got signal. Exiting ...
19297:20070608:134637 Got signal. Exiting ...
19306:20070608:134637 Got signal. Exiting ...
19308:20070608:134637 Got signal. Exiting ...
19299:20070608:134637 Got signal. Exiting ...
19298:20070608:134637 Got signal. Exiting ...
19302:20070608:134637 Got signal. Exiting ...
19304:20070608:134637 Got signal. Exiting ...
19301:20070608:134637 Got signal. Exiting ...
19300:20070608:134637 Got signal. Exiting ...
19305:20070608:134637 Got signal. Exiting ...
19303:20070608:134637 Got signal. Exiting ...
19292:20070608:134639 ZABBIX Server stopped

I also compiled the code with -g (CCFLAGS="-g"), and ran it under gdb.

What is interesting in that senario(to me anyway), is that it DOES NOT crash while either itself, or it parent is running under gdb, but the minute I exit gdb , it immediately crashes.

Thanks,
Andrew

**agehring** · 08-06-2007, 22:19

GDB OUtput

GDB reported the following...

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbf7fffec
0x0001b1dc in zabbix_log (level=4, fmt=0x3979c "Query [%s]") at log.c:180
180 {

**agehring** · 08-06-2007, 22:36

More GDB

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbf7fffdc
0x0001b1dc in zabbix_log (level=4, fmt=0x3979c "Query [%s]") at log.c:180
180 {

(gdb) x 0x0001b1dc
0x1b1dc <zabbix_log+12>: 0x01ec3be8
(gdb)

And from here I'm lost...

Thanks,
Andrew

**agehring** · 08-06-2007, 22:52

Yet more GDB

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbf7fffec
0x9000319e in szone_malloc ()

0x9000319e <szone_malloc+12>: 0x16786de8

Program starts at approx 161M in size, and grows to 161G in about 46 seconds...

**Alexei** · 09-06-2007, 09:25

I am pretty sure that configuration of nodes is messed up on your system. Please could you send me result of:

select * from nodes
NodeID from zabbix_server.log

Thanks.

**agehring** · 11-06-2007, 14:42

mysql> select * from nodes;
+--------+------------+----------+-----------+-------+---------------+--------------+--------------+----------------+--------------------+---------------------+----------+----------+
| nodeid | name | timezone | ip | port | slave_history | slave_trends | event_lastid | history_lastid | history_str_lastid | history_uint_lastid | nodetype | masterid |
+--------+------------+----------+-----------+-------+---------------+--------------+--------------+----------------+--------------------+---------------------+----------+----------+
| 0 | Local node | 0 | 127.0.0.1 | 10051 | 30 | 365 | 0 | 0 | 0 | 0 | 1 | 0 |
+--------+------------+----------+-----------+-------+---------------+--------------+--------------+----------------+--------------------+---------------------+----------+----------+
1 row in set (0.00 sec)

zabbix:~ root# cat /etc/zabbix/zabbix_server.conf
# This is config file for ZABBIX server process
# To get more information about ZABBIX,
# go http://www.zabbix.com

############ GENERAL PARAMETERS #################

# This defines unique NodeID in distributed setup,
# Default value 0 (standalone server)
# This parameter must be between 0 and 999
#NodeID=0

**agehring** · 11-06-2007, 17:56

libMallocDebug

I enabled libMallocDebug, and am getting the following...

libMallocDebug[zabbix_server-719]: frame pointer goes from bffee098 to bfffe758 -- assuming invalid.

relevant?

Thanks,
Andrew

**Alexei** · 11-06-2007, 19:16

Please could you try the latest code from svn://svn.zabbix.com/branches/1.4.1 ? We fixed several memory related issues, it could affect the reported problem.

**agehring** · 11-06-2007, 23:03

It still crashes...

I'm going to compile with -g again...

Andrew

**Alexei** · 12-06-2007, 06:57

I'm very very interested in getting more details about this crash! Can you get a backtrace of executed functions? Thank you.

Ad Widget

1.4 Node Watcher Thread

1.4 Node Watcher Thread

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment