Ad Widget

Collapse

zabbix_server dies after receiving child node history

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • claytronic
    Member
    • Nov 2006
    • 52

    #1

    zabbix_server dies after receiving child node history

    I'm moving 1.4.3 into our production environment today and have run into a strange problem. The master zabbix-server dies right after it starts receiving history from the child node.

    Anyone have recommendations on troubleshooting this?

    System Info

    Ubuntu 7.10
    2.6.22-14-server #1 SMP Sun Oct 14 23:34:23 GMT 2007 i686 GNU/Linux

    Code:
      5381:20071215:115633 Starting zabbix_server. ZABBIX 1.4.3.
      5381:20071215:115633 **** Enabled features ****
      5381:20071215:115633 SNMP monitoring:       YES
      5381:20071215:115633 WEB monitoring:        YES
      5381:20071215:115633 Jabber notifications:   NO
      5381:20071215:115633 IPv6 support:           NO
      5381:20071215:115633 **************************
      5382:20071215:115633 server #1 started [Poller. SNMP:ON]
      5383:20071215:115633 server #2 started [Poller. SNMP:ON]
      5384:20071215:115633 server #3 started [Poller. SNMP:ON]
      5385:20071215:115633 server #4 started [Poller. SNMP:ON]
      5387:20071215:115633 server #6 started [Trapper]
      5388:20071215:115633 server #7 started [Trapper]
      5386:20071215:115633 server #5 started [Poller. SNMP:ON]
      5389:20071215:115633 server #8 started [Trapper]
      5390:20071215:115633 server #9 started [Trapper]
      5391:20071215:115633 server #10 started [Trapper]
      5392:20071215:115633 server #11 started [ICMP pinger]
      5394:20071215:115633 server #12 started [Alerter]
      5395:20071215:115633 server #13 started [Housekeeper]
      5395:20071215:115633 Executing housekeeper
      5399:20071215:115634 server #16 started [Node watcher. Node ID:1]
      5396:20071215:115634 server #14 started [Timer]
      5398:20071215:115634 server #15 started [Poller for unreachable hosts. SNMP:ON]
      5400:20071215:115634 server #17 started [HTTP Poller]
      5402:20071215:115634 server #18 started [HTTP Poller]
      5403:20071215:115634 server #19 started [HTTP Poller]
      5404:20071215:115635 server #20 started [HTTP Poller]
      5405:20071215:115635 server #21 started [HTTP Poller]
      5381:20071215:115635 server #0 started [Watchdog]
      5406:20071215:115636 server #22 started [Discoverer. SNMP:ON]
      5399:20071215:115647 NODE 1: Sending configuration changes of node 5 to node 5 datalen 28532
      5395:20071215:115658 Deleted 20374 records from history and trends
      5389:20071215:115658 NODE 1: Received data from node 5 for node 5 datalen 14302
      5390:20071215:115658 NODE 1: Received events from node 5 for node 5 datalen 4090
      5391:20071215:115659 NODE 1: Received history from node 5 for node 5 datalen 7925
      5387:20071215:115659 NODE 1: Received history from node 5 for node 5 datalen 3663
      5388:20071215:115659 NODE 1: Received history from node 5 for node 5 datalen 2943
      5381:20071215:115659 One child process died. Exiting ...
      5381:20071215:115701 ZABBIX Server stopped
  • boy01
    Junior Member
    • Dec 2007
    • 24

    #2
    Originally posted by claytronic
    I'm moving 1.4.3 into our production environment today and have run into a strange problem. The master zabbix-server dies right after it starts receiving history from the child node.
    I've seen this after 1.4.3 upgrade. Don't know which helped,
    but I killed zabbix_agentd and increased timeout params:
    Timeout=15
    TrapperTimeout=15

    I think it was timeouts.

    But now during last weekend my zabbix_server process
    died and I've now increased my DebugLevel to 4 and hopefully next
    time I'll get some information to figure out reason for dying.
    In log was only: One server process died. Shutting down... or something.

    Using Mysql (4.1.20) InnoDB on CentOS 4.3 (php 4.3.9).

    Comment

    • claytronic
      Member
      • Nov 2006
      • 52

      #3
      Thanks for the info on the timeout values. I changed the zabbix_server.conf on both the master and the child and restarted both. This did not correct the problem.

      Master Log
      Code:
      14309:20071217:092612 NODE 1: Received data from node 5 for node 5 datalen 14270
      14313:20071217:092612 NODE 1: Received events from node 5 for node 5 datalen 2509
      14310:20071217:092612 NODE 1: Received history from node 5 for node 5 datalen 3581
      14311:20071217:092612 NODE 1: Received history from node 5 for node 5 datalen 846
      14312:20071217:092613 NODE 1: Received history from node 5 for node 5 datalen 28211
      14303:20071217:092613 One child process died. Exiting ...
      14303:20071217:092615 ZABBIX Server stopped
      Child Log
      Code:
      4367:20071217:092631 NODE 5: Sending new history_str of node 5 to node 1 datalen 28211
      4367:20071217:092632 NOT OK
      4367:20071217:092632 NODE 5: Sending new history_str of node 5 to node 1 datalen 28211
      4367:20071217:092634 Error while receiving answer from Node [1]

      Comment

      • claytronic
        Member
        • Nov 2006
        • 52

        #4
        [Update]

        I just upgraded to 1.4.4 on both machines and the problem persists.

        Comment

        • claytronic
          Member
          • Nov 2006
          • 52

          #5
          [Resolved]

          I didn't want to but I broke down and deleted node 5 and recreated it as node 6. That got things running again until I loaded my PIX template. The master node died as soon as the remote started transmitting the data for the PIX.

          This lead me to suspect the template import. I tried removing the host but this did not prevent the remote from crashing the master. Next I stopped the remote server, reloaded the mysql database from a backup, reran the zabbix -n 6 command, and configured the remote for communicating with the master. This time I manually created the four SNMP items that I'm interested in and it did not cause a crash on the master.

          Comment

          • ReeD
            Junior Member
            • Jul 2007
            • 9

            #6
            It works. I have the same problem. Thank for solution!

            Comment

            Working...