Ad Widget

Collapse

Node Sync Issue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • side_control
    Member
    • Mar 2008
    • 37

    #1

    Node Sync Issue

    I've had this working before, but we've recently upgraded all of our machines and moved to a new datacenter, now I'm receiving partial updates from all nodes.

    Zabbix 1.6.6
    MySQL 5.x

    After extensive reading and research, I have added the following to all my zabbix databases

    Code:
    create index node_cksum_index_2 on node_cksum (nodeid, cksumtype);
    This helped, the nodes started transferring data regularly however it does not transfer all the data.

    I've modified server.c to increase my TrapperTimeout to 3600, recompiled and edited /etc/zabbix/zabbix_server.conf, under the assumption that the node sync was timing out.

    This is the error message.

    Code:
     20679:20091013:121836 NODE 1: Received history_uint from node 3 for node 3 datalen 238629
     20679:20091013:122428 NODE 1: Received data from slave node 2 for node 2 datalen 8
     20679:20091013:122937 NODE 1: Error while receiving answer from Node [2] error: ZBX_TCP_READ() failed [Interrupted system call]
     20678:20091013:122937 NODE 1: Received data from slave node 3 for node 3 datalen 1752
     20678:20091013:123454 NODE 1: Error while receiving answer from Node [3] error: ZBX_TCP_READ() failed [Interrupted system call]
     20677:20091013:123800 NODE 1: Received history from node 3 for node 3 datalen 167417
     20678:20091013:123815 NODE 1: Received history_uint from node 3 for node 3 datalen 280809
    The symptoms is that my overview and dashboard are not showing change in triggers, however configuration changes are sent. So my dashboard, overview is not up to date.

    In debug 4 I see several of the follow messages but no errors or timeouts.

    Code:
     20927:20091013:125037 Starting sync with nodes
     20914:20091013:125037 End substitute_functions() [1=0]
     20914:20091013:125037 In evaluate(1=0)
     20914:20091013:125037 In evaluate_simple(1=0)
     20914:20091013:125037 In evaluate_simple(1)
     20914:20091013:125037 In evaluate_simple(0)
     20914:20091013:125037 End evaluate(result:0.000000)
     20914:20091013:125037 End evaluate_expression(result:0)
    Any help at all would be greatly appreciated.
  • side_control
    Member
    • Mar 2008
    • 37

    #2
    I hate to say this but I didn't have any luck finding a resolution on my own and I had to get this working by any means necessary. In my futile attempt to fix this, I blew away all my zabbix nodes (3) reimported the xml's of my host information and now it's working.

    I still have my logs and my sql dump if you want to investigate further.

    Comment

    • side_control
      Member
      • Mar 2008
      • 37

      #3
      I spoke too soon. The issue cropped up again after I added all my hosts, I've adjusted my trapper threads to 60 to see if it will process the data faster, any thoughts to this?

      Comment

      • side_control
        Member
        • Mar 2008
        • 37

        #4
        Well, for anybody who might have this issue out there. Looks like the datalen packet was too big and the server was timing out on the query into the database. After much tinkering with the mysql database, using mysqltuner.pl (wonderful tool) I tuned my database using 90% of my available memory and it seems to be working, keeping my fingers crossed.

        Comment

        Working...