Ad Widget

Collapse

Zabbix Server 2 Crash with node

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • spaww
    Senior Member
    Zabbix Certified Specialist
    • May 2009
    • 178

    #1

    Zabbix Server 2 Crash with node

    Hi,

    The setup:

    Both box with ubuntu server 12.04 + zabbix 2;

    The box 1 (master node) runs fine alone and with proxyes;

    The box 2 (first node - database number 2) runs without problems and TRY to send the data to box 1 but receive TCP error after some seconds:
    Code:
     19639:20120622:090416.798 cannot send list of active checks to [127.0.0.1]: host [zabbix_node1] not monitored
     19647:20120622:090536.807 NODE 2: Sending configuration changes to master node 1 for node 2 datalen 5760
     19647:20120622:090543.173 NODE 2: Error while receiving answer from Node [1] error: ZBX_TCP_READ() failed: [104] Connection reset by peer
     19647:20120622:090545.477 NODE 2: Error while receiving answer from Node [1] error: ZBX_TCP_READ() failed: [104] Connection reset by peer
     19647:20120622:090545.510 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.521 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.532 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.538 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.544 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.551 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.561 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.577 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.582 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
     19647:20120622:090545.589 NODE 2: Unable to connect to Node [1] error: cannot connect to [[10.11.210.241]:10051]: [111] Connection refused
    Last log lines in box 1 (master node):
    Code:
    root@zabbixone:~# tail /tmp/zabbix_server.log  -n100
      1470:20120622:053054.113 ******************************
      1472:20120622:053054.632 server #1 started [configuration syncer #1]
      1480:20120622:053054.655 server #9 started [trapper #1]
      1481:20120622:053054.656 server #10 started [trapper #2]
      1473:20120622:053054.659 server #2 started [db watchdog #1]
      1482:20120622:053054.664 server #11 started [trapper #3]
      1474:20120622:053054.670 server #3 started [poller #1]
      1484:20120622:053054.691 server #13 started [trapper #5]
      1485:20120622:053054.692 server #14 started [icmp pinger #1]
      1486:20120622:053054.693 server #15 started [alerter #1]
      1487:20120622:053054.694 server #16 started [housekeeper #1]
      1487:20120622:053054.695 executing housekeeper
      1483:20120622:053054.697 server #12 started [trapper #4]
      1477:20120622:053054.700 server #6 started [poller #4]
      1476:20120622:053054.701 server #5 started [poller #3]
      1478:20120622:053054.703 server #7 started [poller #5]
      1475:20120622:053054.707 server #4 started [poller #2]
      1494:20120622:053054.716 server #18 started [node watcher #1]
      1493:20120622:053054.723 server #17 started [timer #1]
      1479:20120622:053054.725 server #8 started [unreachable poller #1]
      1499:20120622:053054.732 server #21 started [history syncer #1]
      1500:20120622:053054.739 server #22 started [history syncer #2]
      1496:20120622:053054.740 server #19 started [http poller #1]
      1501:20120622:053054.741 server #23 started [history syncer #3]
      1507:20120622:053054.742 server #26 started [ipmi poller #1]
      1508:20120622:053054.743 server #27 started [ipmi poller #2]
      1505:20120622:053054.747 server #24 started [history syncer #4]
      1506:20120622:053054.748 server #25 started [escalator #1]
      1470:20120622:053054.757 server #0 started [main process]
      1516:20120622:053054.757 server #28 started [proxy poller #1]
      1497:20120622:053054.758 server #20 started [discoverer #1]
      1517:20120622:053054.763 server #29 started [self-monitoring #1]
      1497:20120622:053054.788 fping failed: "10.11.210.0 :"
      1487:20120622:053104.017 housekeeper deleted: 38 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
      1482:20120622:053151.326 NODE 1: Received configuration changes from slave node 2 for node 2 datalen 5760
    *** stack smashing detected ***: /usr/local/sbin/zabbix_server terminated
      1482:20120622:053156.997 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0x65373234]. Crashing ...
      1482:20120622:053156.998 ====== Fatal information: ======
      1482:20120622:053156.998 Program counter: 0xb7034b19
      1482:20120622:053156.998 === Registers: ===
      1482:20120622:053156.998 gs      =               33 =                   51 =                   51
      1482:20120622:053156.998 fs      =                0 =                    0 =                    0
      1482:20120622:053156.998 es      =               7b =                  123 =                  123
      1482:20120622:053156.998 ds      =               7b =                  123 =                  123
      1482:20120622:053156.998 edi     =         bfb83190 =           3216519568 =          -1078447728
      1482:20120622:053156.998 esi     =         bfb830d0 =           3216519376 =          -1078447920
      1482:20120622:053156.999 ebp     =         bfb83218 =           3216519704 =          -1078447592
      1482:20120622:053156.999 esp     =         bfb83050 =           3216519248 =          -1078448048
      1482:20120622:053156.999 ebx     =         b703bff4 =           3070476276 =          -1224491020
      1482:20120622:053157.000 edx     =         65373234 =           1698116148 =           1698116148
      1482:20120622:053157.000 ecx     =         bfb84470 =           3216524400 =          -1078442896
      1482:20120622:053157.000 eax     =         bfb83190 =           3216519568 =          -1078447728
      1482:20120622:053157.000 trapno  =                e =                   14 =                   14
      1482:20120622:053157.001 err     =                4 =                    4 =                    4
      1482:20120622:053157.001 eip     =         b7034b19 =           3070446361 =          -1224520935
      1482:20120622:053157.001 cs      =               73 =                  115 =                  115
      1482:20120622:053157.001 efl     =           210246 =              2163270 =              2163270
      1482:20120622:053157.002 uesp    =         bfb83050 =           3216519248 =          -1078448048
      1482:20120622:053157.002 ss      =               7b =                  123 =                  123
      1482:20120622:053157.002 === Stack frame: ===
      1482:20120622:053157.003 +0x40(%ebp) = ebp + 64 = 00000000 =          0 =           0
      1482:20120622:053157.003 +0x3c(%ebp) = ebp + 60 = 00000040 =         64 =          64
      1482:20120622:053157.003 +0x38(%ebp) = ebp + 56 = 00000004 =          4 =           4
      1482:20120622:053157.003 +0x34(%ebp) = ebp + 52 = bfb837a0 = 3216521120 = -1078446176
      1482:20120622:053157.004 +0x30(%ebp) = ebp + 48 = 00000000 =          0 =           0
      1482:20120622:053157.004 +0x2c(%ebp) = ebp + 44 = 080cebce =  135064526 =   135064526
      1482:20120622:053157.004 +0x28(%ebp) = ebp + 40 = bfb83908 = 3216521480 = -1078445816
      1482:20120622:053157.004 +0x24(%ebp) = ebp + 36 = 00000000 =          0 =           0
      1482:20120622:053157.005 +0x20(%ebp) = ebp + 32 = 00000000 =          0 =           0
      1482:20120622:053157.005 +0x1c(%ebp) = ebp + 28 = 080cebcf =  135064527 =   135064527
      1482:20120622:053157.005 +0x18(%ebp) = ebp + 24 = 00000000 =          0 =           0
      1482:20120622:053157.005 +0x14(%ebp) = ebp + 20 = b70b590b = 3070974219 = -1223993077
      1482:20120622:053157.006 +0x10(%ebp) = ebp + 16 = 081225d4 =  135407060 =   135407060
      1482:20120622:053157.006 +0x0c(%ebp) = ebp + 12 = bfb8324c = 3216519756 = -1078447540
      1482:20120622:053157.006 +0x08(%ebp) = ebp +  8 = b7173f00 = 3071753984 = -1223213312 <--- call arguments
      1482:20120622:053157.007 +0x04(%ebp) = ebp +  4 = b7174007                            <--- return address
      1482:20120622:053157.007      (%ebp) = ebp      = bfb83278                            <--- saved ebp value
      1482:20120622:053157.007 -0x04(%ebp) = ebp -  4 = bfb837a0 = 3216521120 = -1078446176 <--- local variables
      1482:20120622:053157.008 -0x08(%ebp) = ebp -  8 = 00000040 =         64 =          64
      1482:20120622:053157.008 -0x0c(%ebp) = ebp - 12 = 00000000 =          0 =           0
      1482:20120622:053157.008 -0x10(%ebp) = ebp - 16 = 00000000 =          0 =           0
      1482:20120622:053157.008 -0x14(%ebp) = ebp - 20 = 00000000 =          0 =           0
      1482:20120622:053157.009 -0x18(%ebp) = ebp - 24 = 00000000 =          0 =           0
      1482:20120622:053157.009 -0x1c(%ebp) = ebp - 28 = 00000000 =          0 =           0
      1482:20120622:053157.009 -0x20(%ebp) = ebp - 32 = 00000000 =          0 =           0
      1482:20120622:053157.009 -0x24(%ebp) = ebp - 36 = 00000000 =          0 =           0
      1482:20120622:053157.010 -0x28(%ebp) = ebp - 40 = 40000000 = 1073741824 =  1073741824
      1482:20120622:053157.010 -0x2c(%ebp) = ebp - 44 = 0806fc60 =  134675552 =   134675552
      1482:20120622:053157.010 -0x30(%ebp) = ebp - 48 = 0812fff4 =  135462900 =   135462900
      1482:20120622:053157.011 -0x34(%ebp) = ebp - 52 = 00000000 =          0 =           0
      1482:20120622:053157.011 -0x38(%ebp) = ebp - 56 = 00000000 =          0 =           0
      1482:20120622:053157.011 -0x3c(%ebp) = ebp - 60 = 65373234 = 1698116148 =  1698116148
      1482:20120622:053157.012 -0x40(%ebp) = ebp - 64 = bfb84470 = 3216524400 = -1078442896
      1482:20120622:053157.012 === Backtrace: ===
      1470:20120622:053157.013 One child process died (PID:1482,exitcode/signal:11). Exiting ...
      1470:20120622:053159.015 syncing history data...
      1470:20120622:053159.122 syncing history data done
      1470:20120622:053159.122 syncing trends data...
      1470:20120622:053159.945 syncing trends data done
      1470:20120622:053159.946 Zabbix Server stopped. Zabbix 2.0.0 (revision 27675).
    Adail Horst
    OCA/OCP - Oracle Application Server
    ZABBIX Certified Specialist
    http://www.spinola.net.br/blog (Blog sobre Zabbix e tecnologia)
Working...