Ad Widget

Collapse

1.8.2 DM Configuration Sync Error

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Evan.Anderson
    Member
    • Jun 2009
    • 42

    #1

    1.8.2 DM Configuration Sync Error

    I have 4 server 1 master, 3 children. master is nodeid 1 , children are 2,3,4. This children operate fine by themselves, but 2 and 4 fail to sync with master giving me:

    NODE 1: Sending configuration changes to slave node 4 for node 4 datalen 1738670
    NODE 1: Sending configuration changes to slave node 2 for node 2 datalen 2577922
    NODE 1: Error while receiving answer from Node [2] error: ZBX_TCP_READ() failed [Interrupted system call]
    NODE 1: Error while receiving answer from Node [4] error: ZBX_TCP_READ() failed [Interrupted system call]

    I've change the zabbix trapper timeout to 300, but it doesn't finish in time. and I think the connection is getting dropped.

    Backround info.
    I was working creating some large templates on node2 for switch gear, and applied templates removed templates and I think it just got behind.
    I'm not totally sure about node4, but I have added and removed a lot of host at time and made large configuration changes and I suspect that it just got behind.
    Node info.
    1:
    Number of hosts (monitored/not monitored/templates) 1059 195 / 666 / 198
    Number of items (monitored/disabled/not supported) 2003 1843 / 131 / 29
    Number of triggers (enabled/disabled)[true/unknown/false] 2594 2567 / 27 [6 / 1223 / 1338
    Required server performance, new values per second 25.6744 -
    --at one point performance was up to 54
    2:
    Number of hosts (monitored/not monitored/templates) 567 163 / 6 / 398
    Number of items (monitored/disabled/not supported) 1109 1085 / 0 / 24
    Number of triggers (enabled/disabled)[true/unknown/false] 1634 1626 / 8 [13 / 14 / 1599]
    Required server performance, new values per second 6.6571 -
    3:
    Number of hosts (monitored/not monitored/templates) 76 24 / 1 / 51
    Number of items (monitored/disabled/not supported) 476 476 / 0 / 0
    Number of triggers (enabled/disabled)[true/unknown/false] 768 768 / 0 [0 / 0 / 768]
    Required server performance, new values per second 9.4667 -
    4:
    Number of hosts (monitored/not monitored/templates) 763 666 / 0 / 97
    Number of items (monitored/disabled/not supported) 1079 933 / 131 / 15
    Number of triggers (enabled/disabled)[true/unknown/false] 1040 1021 / 19 [0 / 844 / 177]
    Required server performance, new values per second 19.9187 -
    I think now the transfer is just not finishing in the expected timeout value. I've attempted to stop servers and start servers, stop 2 to see if 1 will finish with no luck. I'm looking for a way to purge the configuration data that the master is trying to send to the children is this possible? I don't want to rebuild my master if I don't have to. Is there a way to start the daemon so that all it will do is sync up? Can I manually export the data from the master and import it into the child? Any help will be apppreciated Thanks
  • Evan.Anderson
    Member
    • Jun 2009
    • 42

    #2
    Oh, 1.8.3 I did see to check for this

    MySQL:
    DROP INDEX node_cksum_cksum_1 ON node_cksum;
    CREATE INDEX node_cksum_1 on node_cksum (nodeid,cksumtype,tablename,recordid);

    mysql> show index from node_cksum;
    +------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    | node_cksum | 1 | node_cksum_1 | 1 | nodeid | A | 16 | NULL | NULL | | BTREE | |
    | node_cksum | 1 | node_cksum_1 | 2 | cksumtype | A | 16 | NULL | NULL | | BTREE | |
    | node_cksum | 1 | node_cksum_1 | 3 | tablename | A | 16 | NULL | NULL | | BTREE | |
    | node_cksum | 1 | node_cksum_1 | 4 | recordid | A | 229905 | NULL | NULL | | BTREE | |
    +------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    4 rows in set (0.01 sec)

    So my servers are already configured this way.

    Comment

    • Evan.Anderson
      Member
      • Jun 2009
      • 42

      #3
      Bumpin'

      So, how do I purge pending configuration changes?

      Comment

      • Evan.Anderson
        Member
        • Jun 2009
        • 42

        #4
        Can I move the configuration changes manually?

        Is it possible to export from mysql what the master is trying to send and then import it into the child?

        Comment

        • Evan.Anderson
          Member
          • Jun 2009
          • 42

          #5
          Rebuild?

          Aside from a rebuild of the master, is there anything I can do to keep my current implementation? I don't want to rebuild the master. The child nodes function fine on their own, it's just the master configuration changes that won't sync. Anyone have any thoughts?

          Comment

          • Evan.Anderson
            Member
            • Jun 2009
            • 42

            #6
            Is there no way to fix this?

            It seems like I should be able to purge something on the master, but I'm not a guru of zabbix or mysql, but I get along with guidance. If I knew which data to get rid of, I'm sure I could figure out how.

            Comment

            • NOB
              Senior Member
              Zabbix Certified Specialist
              • Mar 2007
              • 469

              #7
              Hi

              I don't know 1.8 in detail.
              But we and a lot of others faced this problem after migrating from
              1.4.x to 1.6.x. An initial update of more than 1 MB will finally
              timeout and the synchronization will never finish.

              The solution we found at that time is mentioned in http://www.zabbix.com/forum/showthre...t=12226&page=4

              Perhaps this will help in this case, too.

              Regards

              Norbert.

              Comment

              • Evan.Anderson
                Member
                • Jun 2009
                • 42

                #8
                Thanks for the reply

                I have a couple of concerns, it looks like the mainstream version was 1.6.6 at the time this possible solution was found. I did find this post(http://www.zabbix.com/forum/showthre...t=12226&page=4) by searching, but did not attempt it as I'm on 1.8 and I would have thought it would have been updated in the versions since 1.6.6.

                Can you elaborate on what this new index is supposed to do?

                Also, will delete from node_cksum where nodeid=8 and cksumtype=1 delete pending configuration changes for a particular node? I would be happy with purging anything "stuck" in queue to be sent to the child nodes...

                Comment

                • NOB
                  Senior Member
                  Zabbix Certified Specialist
                  • Mar 2007
                  • 469

                  #9
                  Hi

                  this SQL statement causing the trouble - delete from node_cksum where nodeid=xx and cksumtype=yy - is still in zabbix 1.8.3 !
                  See zabbix-1.8.3/src/zabbix_server/trapper/nodesync.c which
                  means that the receiver (trapper) is executing it.
                  And the proposed index is NOT in the schema creation script for MySQL in zabbix 1.8.3 !

                  I think there is no harm in creating this additional index.
                  If it doesn't show the expected results, just drop it.

                  If you don't know exactly what you are doing, I wouldn't try to manipulate the
                  contents of the DB manually.
                  Sometimes we loose a relation of an item to the template and are able to fix it manually.
                  But that's about it we can do and know.

                  Best regards

                  Norbert.
                  Last edited by NOB; 24-08-2010, 16:28. Reason: Added check for creation of the index / warning for manual fixing

                  Comment

                  • Evan.Anderson
                    Member
                    • Jun 2009
                    • 42

                    #10
                    Any other way to purge pending configuration changes?

                    This didn't seem to work, sometime restarting a daemon now and then will get some accross, but for the most part, it's just stuck.

                    Comment

                    • Evan.Anderson
                      Member
                      • Jun 2009
                      • 42

                      #11
                      Master Node Rebuild?

                      If I rebuild, will the child nodes push there existing configuration to a new master?

                      Comment

                      • Evan.Anderson
                        Member
                        • Jun 2009
                        • 42

                        #12
                        Has anyone rebuilt a master server?

                        What happens if the master is rebuilt, ya know like drop the database and create a new one?

                        Comment

                        Working...