Hello
Maybe someone got the same problems like me?
I have a brand new setup with zabbix 1.8.3 with Postgresql.
I also using CentOS 5.5 Fresh Clean Install as the host OS.
Both machines are exactly the same setup.
Then I setup a distributed monitoring like the following:
Node 10=172.22.0.10, Master
Node 20=172.23.0.10, Slave
I setup the nodes in both Web-Interfaces and everything seems to work.
Now I updated today to 1.8.4 and since then I get the following error only on the master node:
21207:20110105:143017.479 Starting Zabbix Server. Zabbix 1.8.4 (revision 16604).
21207:20110105:143017.481 ****** Enabled features ******
21207:20110105:143017.481 SNMP monitoring: YES
21207:20110105:143017.481 IPMI monitoring: YES
21207:20110105:143017.481 WEB monitoring: YES
21207:20110105:143017.481 Jabber notifications: YES
21207:20110105:143017.481 Ez Texting notifications: YES
21207:20110105:143017.481 ODBC: NO
21207:20110105:143017.481 SSH2 support: YES
21207:20110105:143017.481 IPv6 support: NO
21207:20110105:143017.481 ******************************
21210:20110105:143017.628 server #1 started [DB Cache]
21212:20110105:143017.697 server #2 started [Poller. SNMP:YES]
21213:20110105:143017.803 server #3 started [Poller. SNMP:YES]
21214:20110105:143017.915 server #4 started [Poller. SNMP:YES]
21221:20110105:143018.074 server #8 started [Trapper]
21218:20110105:143018.102 server #5 started [Poller. SNMP:YES]
21219:20110105:143018.153 server #6 started [Poller. SNMP:YES]
21220:20110105:143018.227 server #7 started [Poller for unreachable hosts. SNMP:YES]
21227:20110105:143018.371 server #10 started [Trapper]
21229:20110105:143018.415 server #11 started [Trapper]
21231:20110105:143018.460 server #12 started [Trapper]
21233:20110105:143018.491 server #13 started [ICMP pinger]
21222:20110105:143018.508 server #9 started [Trapper]
21234:20110105:143018.595 server #14 started [Alerter]
21238:20110105:143018.657 server #15 started [Housekeeper]
21238:20110105:143018.658 Executing housekeeper
21240:20110105:143018.728 server #16 started [Timer]
21242:20110105:143019.135 server #17 started [Node watcher. Node ID:10]
21246:20110105:143019.269 server #20 started [DB Syncer]
21243:20110105:143019.731 server #18 started [HTTP Poller]
21245:20110105:143019.812 server #19 started [Discoverer. SNMP:YES]
21251:20110105:143019.931 server #21 started [DB Syncer]
21252:20110105:143019.973 server #22 started [DB Syncer]
21258:20110105:143020.011 server #23 started [DB Syncer]
21260:20110105:143020.057 server #24 started [Escalator]
21262:20110105:143020.085 server #25 started [Proxy Poller]
21207:20110105:143020.127 server #0 started [Watchdog]
21221:20110105:143021.827 NODE 10: Received history from node 20 for node 20 datalen 9878
21231:20110105:143022.117 NODE 10: Received history_uint from node 20 for node 20 datalen 3526
21231:20110105:143022.423 NODE 10: Received auditlog from node 20 for node 20 datalen 329
21231:20110105:143022.495 NODE 10: Received auditlog_details from node 20 for node 20 datalen 10356
21231:20110105:143022.495 Got signal [signal:11(SIGSEGV),reason:128,refaddr
nil)]. Crashing ...
21231:20110105:143022.496 ====== Fatal information: ======
21231:20110105:143022.496 Program counter: 0x3d35479a10
21231:20110105:143022.496 === Registers: ===
21231:20110105:143022.496 r8 = 0 = 0 = 0
21231:20110105:143022.496 r9 = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.496 r10 = 22 = 34 = 34
21231:20110105:143022.496 r11 = 246 = 582 = 582
21231:20110105:143022.496 r12 = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.496 r13 = 73 = 115 = 115
21231:20110105:143022.496 r14 = d = 13 = 13
21231:20110105:143022.497 r15 = 7fff9bac92ec = 140735805166316 = 140735805166316
21231:20110105:143022.497 rdi = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.497 rsi = 1 = 1 = 1
21231:20110105:143022.497 rbp = 7fff9bac8b10 = 140735805164304 = 140735805164304
21231:20110105:143022.497 rbx = 7fff9bac8ca0 = 140735805164704 = 140735805164704
21231:20110105:143022.497 rdx = 7fff9bac8ce8 = 140735805164776 = 140735805164776
21231:20110105:143022.497 rax = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.497 rcx = 3 = 3 = 3
21231:20110105:143022.498 rsp = 7fff9bac8468 = 140735805162600 = 140735805162600
21231:20110105:143022.498 rip = 3d35479a10 = 262886890000 = 262886890000
21231:20110105:143022.498 efl = 10217 = 66071 = 66071
21231:20110105:143022.498 csgsfs = 33 = 51 = 51
21231:20110105:143022.498 err = 0 = 0 = 0
21231:20110105:143022.498 trapno = d = 13 = 13
21231:20110105:143022.498 oldmask = 0 = 0 = 0
21231:20110105:143022.498 cr2 = 0 = 0 = 0
21231:20110105:143022.498 === Backtrace: ===
21231:20110105:143022.504 15: zabbix_server(print_fatal_info+0xcd) [0x43bc3d]
21231:20110105:143022.505 14: zabbix_server(child_signal_handler+0xeb) [0x43b48b]
21231:20110105:143022.505 13: /lib64/libc.so.6 [0x3d354302d0]
21231:20110105:143022.505 12: /lib64/libc.so.6(strlen+0x10) [0x3d35479a10]
21231:20110105:143022.505 11: /lib64/libc.so.6(_IO_vfprintf+0x4479) [0x3d35446b69]
21231:20110105:143022.505 10: /lib64/libc.so.6(vsnprintf+0x9a) [0x3d3546988a]
21231:20110105:143022.505 9: zabbix_server(zbx_vsnprintf+0x16) [0x443bf6]
21231:20110105:143022.505 8: zabbix_server(__zbx_zbx_snprintf_alloc+0x112) [0x443d42]
21231:20110105:143022.505 7: zabbix_server [0x42040d]
21231:20110105:143022.505 6: zabbix_server(node_history+0x409) [0x420cb9]
21231:20110105:143022.505 5: zabbix_server(process_trapper_child+0x2b4) [0x41ea64]
21231:20110105:143022.506 4: zabbix_server(child_trapper_main+0xa2) [0x41f272]
21231:20110105:143022.506 3: zabbix_server(MAIN_ZABBIX_ENTRY+0x5a8) [0x4106f8]
21231:20110105:143022.506 2: zabbix_server(daemon_start+0x1fe) [0x43b27e]
21231:20110105:143022.506 1: /lib64/libc.so.6(__libc_start_main+0xf4) [0x3d3541d994]
21231:20110105:143022.506 0: zabbix_server [0x40cd29]
It seems that the trapper is going mad after receiving the auditlog_details from the node 20.
This error is reproducable when I start node 10 again.
This is not a productive environment yet but it should be in the next days.
Can anyone help in this case or should I open a bug@zabbix?
Thanks
Peter
Maybe someone got the same problems like me?
I have a brand new setup with zabbix 1.8.3 with Postgresql.
I also using CentOS 5.5 Fresh Clean Install as the host OS.
Both machines are exactly the same setup.
Then I setup a distributed monitoring like the following:
Node 10=172.22.0.10, Master
Node 20=172.23.0.10, Slave
I setup the nodes in both Web-Interfaces and everything seems to work.
Now I updated today to 1.8.4 and since then I get the following error only on the master node:
21207:20110105:143017.479 Starting Zabbix Server. Zabbix 1.8.4 (revision 16604).
21207:20110105:143017.481 ****** Enabled features ******
21207:20110105:143017.481 SNMP monitoring: YES
21207:20110105:143017.481 IPMI monitoring: YES
21207:20110105:143017.481 WEB monitoring: YES
21207:20110105:143017.481 Jabber notifications: YES
21207:20110105:143017.481 Ez Texting notifications: YES
21207:20110105:143017.481 ODBC: NO
21207:20110105:143017.481 SSH2 support: YES
21207:20110105:143017.481 IPv6 support: NO
21207:20110105:143017.481 ******************************
21210:20110105:143017.628 server #1 started [DB Cache]
21212:20110105:143017.697 server #2 started [Poller. SNMP:YES]
21213:20110105:143017.803 server #3 started [Poller. SNMP:YES]
21214:20110105:143017.915 server #4 started [Poller. SNMP:YES]
21221:20110105:143018.074 server #8 started [Trapper]
21218:20110105:143018.102 server #5 started [Poller. SNMP:YES]
21219:20110105:143018.153 server #6 started [Poller. SNMP:YES]
21220:20110105:143018.227 server #7 started [Poller for unreachable hosts. SNMP:YES]
21227:20110105:143018.371 server #10 started [Trapper]
21229:20110105:143018.415 server #11 started [Trapper]
21231:20110105:143018.460 server #12 started [Trapper]
21233:20110105:143018.491 server #13 started [ICMP pinger]
21222:20110105:143018.508 server #9 started [Trapper]
21234:20110105:143018.595 server #14 started [Alerter]
21238:20110105:143018.657 server #15 started [Housekeeper]
21238:20110105:143018.658 Executing housekeeper
21240:20110105:143018.728 server #16 started [Timer]
21242:20110105:143019.135 server #17 started [Node watcher. Node ID:10]
21246:20110105:143019.269 server #20 started [DB Syncer]
21243:20110105:143019.731 server #18 started [HTTP Poller]
21245:20110105:143019.812 server #19 started [Discoverer. SNMP:YES]
21251:20110105:143019.931 server #21 started [DB Syncer]
21252:20110105:143019.973 server #22 started [DB Syncer]
21258:20110105:143020.011 server #23 started [DB Syncer]
21260:20110105:143020.057 server #24 started [Escalator]
21262:20110105:143020.085 server #25 started [Proxy Poller]
21207:20110105:143020.127 server #0 started [Watchdog]
21221:20110105:143021.827 NODE 10: Received history from node 20 for node 20 datalen 9878
21231:20110105:143022.117 NODE 10: Received history_uint from node 20 for node 20 datalen 3526
21231:20110105:143022.423 NODE 10: Received auditlog from node 20 for node 20 datalen 329
21231:20110105:143022.495 NODE 10: Received auditlog_details from node 20 for node 20 datalen 10356
21231:20110105:143022.495 Got signal [signal:11(SIGSEGV),reason:128,refaddr
nil)]. Crashing ...21231:20110105:143022.496 ====== Fatal information: ======
21231:20110105:143022.496 Program counter: 0x3d35479a10
21231:20110105:143022.496 === Registers: ===
21231:20110105:143022.496 r8 = 0 = 0 = 0
21231:20110105:143022.496 r9 = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.496 r10 = 22 = 34 = 34
21231:20110105:143022.496 r11 = 246 = 582 = 582
21231:20110105:143022.496 r12 = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.496 r13 = 73 = 115 = 115
21231:20110105:143022.496 r14 = d = 13 = 13
21231:20110105:143022.497 r15 = 7fff9bac92ec = 140735805166316 = 140735805166316
21231:20110105:143022.497 rdi = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.497 rsi = 1 = 1 = 1
21231:20110105:143022.497 rbp = 7fff9bac8b10 = 140735805164304 = 140735805164304
21231:20110105:143022.497 rbx = 7fff9bac8ca0 = 140735805164704 = 140735805164704
21231:20110105:143022.497 rdx = 7fff9bac8ce8 = 140735805164776 = 140735805164776
21231:20110105:143022.497 rax = adadadadadadadad = 12514849900987264429 = -5931894172722287187
21231:20110105:143022.497 rcx = 3 = 3 = 3
21231:20110105:143022.498 rsp = 7fff9bac8468 = 140735805162600 = 140735805162600
21231:20110105:143022.498 rip = 3d35479a10 = 262886890000 = 262886890000
21231:20110105:143022.498 efl = 10217 = 66071 = 66071
21231:20110105:143022.498 csgsfs = 33 = 51 = 51
21231:20110105:143022.498 err = 0 = 0 = 0
21231:20110105:143022.498 trapno = d = 13 = 13
21231:20110105:143022.498 oldmask = 0 = 0 = 0
21231:20110105:143022.498 cr2 = 0 = 0 = 0
21231:20110105:143022.498 === Backtrace: ===
21231:20110105:143022.504 15: zabbix_server(print_fatal_info+0xcd) [0x43bc3d]
21231:20110105:143022.505 14: zabbix_server(child_signal_handler+0xeb) [0x43b48b]
21231:20110105:143022.505 13: /lib64/libc.so.6 [0x3d354302d0]
21231:20110105:143022.505 12: /lib64/libc.so.6(strlen+0x10) [0x3d35479a10]
21231:20110105:143022.505 11: /lib64/libc.so.6(_IO_vfprintf+0x4479) [0x3d35446b69]
21231:20110105:143022.505 10: /lib64/libc.so.6(vsnprintf+0x9a) [0x3d3546988a]
21231:20110105:143022.505 9: zabbix_server(zbx_vsnprintf+0x16) [0x443bf6]
21231:20110105:143022.505 8: zabbix_server(__zbx_zbx_snprintf_alloc+0x112) [0x443d42]
21231:20110105:143022.505 7: zabbix_server [0x42040d]
21231:20110105:143022.505 6: zabbix_server(node_history+0x409) [0x420cb9]
21231:20110105:143022.505 5: zabbix_server(process_trapper_child+0x2b4) [0x41ea64]
21231:20110105:143022.506 4: zabbix_server(child_trapper_main+0xa2) [0x41f272]
21231:20110105:143022.506 3: zabbix_server(MAIN_ZABBIX_ENTRY+0x5a8) [0x4106f8]
21231:20110105:143022.506 2: zabbix_server(daemon_start+0x1fe) [0x43b27e]
21231:20110105:143022.506 1: /lib64/libc.so.6(__libc_start_main+0xf4) [0x3d3541d994]
21231:20110105:143022.506 0: zabbix_server [0x40cd29]
It seems that the trapper is going mad after receiving the auditlog_details from the node 20.
This error is reproducable when I start node 10 again.
This is not a productive environment yet but it should be in the next days.
Can anyone help in this case or should I open a bug@zabbix?
Thanks
Peter
Comment