Ad Widget

Collapse

node communication

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • stage
    Member
    • Sep 2009
    • 34

    #1

    node communication

    I have a strange problem, i have 3 nodes 1 master and 2 client nodes.
    One client node communicates normal with the master and back aging

    But the new node dos this strange thing with his communication.

    It sends his history data tot the master and he receives it and the last communication is “node 3: sending configuration changes to master node 1 for node 3 datalen 4099036”
    I expect that node 3 is waiting for re-communication because he stops with al communication.(I have waited for over one hour)

    When I stop the zabbix services on the master, I get un error on the client node “node 3: Error while receiving answer from node[1] error: ZBX_TCP_READ() failed [Connection reset by peer]” and a lot of “unable to connect to node [1] errors”.

    After the errors the com. from node 3 starts aging with sending the history until there’s a configuration communication

    The communication is over a network without firewall or any other obstruction

    Master = 1.6.6
    Both nodes are = 1.6.7
    All are ubuntu master and node 2 are 9.04 and node 3 is 9.10

    I hope someone can help me,
    Greets,
    Dominic
  • nelsonab
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2006
    • 1233

    #2
    What is the speed of the links between the nodes? Also what is the system load looking like between the nodes and how loaded is the network between the nodes.

    One thing I noted earlier is it seems that when a node is transmitting change data between nodes those nodes are now blocked. On this case when the master is receiving config changes for a child node it is blocked from receiving from the other node.

    This was especially acute and painful when we were testing with 5 VM nodes on one machine. Ya, it was "painful".

    Normally this isn't an issue if the machines are not overly loaded and the inter-node links are fast.
    RHCE, author of zbxapi
    Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
    Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

    Comment

    • stage
      Member
      • Sep 2009
      • 34

      #3
      think it's not the hardware

      This is a 1Gbit network, node 2 monitors over 20 machines and sends only events , node 3 monitors only it self and sends everything.

      The cpu load of the master is in average of 30%, node 2 has an average of 60% and node 3 has an average of 5%.

      I’m positive that it’s not the hardware.

      I have upgraded all the machines to zabbix1.6.7

      The only deference is that node 3 is a ubuntu9.10(64b) and the other one’s are ubuntu9.04(32b)

      What I find very strange is that is only blocks with communication of “configuration changes” end all the other communication are no problem.

      If I exclude node 3 the communication between node 2 and the master is normal, if I put node 3 in and it sends a “configuration changes” communication every thing blocks(communication), the nodes keep monitoring the machines.

      All help is appreciated,
      Dominic

      Comment

      • stage
        Member
        • Sep 2009
        • 34

        #4
        Happy new year

        Does any one have an idea how I can resolve this problem

        All machines are now ubuntu9.04(32b) and have zabbix1.6.7 installed.
        But the problem still exists.

        Every idea is welcome,
        Dominic

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          i'd give 99% that node sync is blocking all other node communication.
          unfortunately, there is no user friendly way to see what the current status is.
          for how long did you have node3 up and syncing ? give it some longer time, it should return to semi-normal state once the syncing finishes
          Zabbix 3.0 Network Monitoring book

          Comment

          • stage
            Member
            • Sep 2009
            • 34

            #6
            It works

            Yes, indeed it comes to a normal stat.
            After an enormous long time, 28hours.

            But the important thing is, it’s a live.

            Thanks for your help,
            Dominic

            Comment

            • richlv
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Oct 2005
              • 3112

              #7
              now that's extreme.
              is any of the databases on a virtual host ?
              Zabbix 3.0 Network Monitoring book

              Comment

              • stage
                Member
                • Sep 2009
                • 34

                #8
                Standard

                No, every node has a standard installation.

                every thing on the localhost, without any bells or whistles

                ubuntu 9.04 and 9.10
                zabbix 1.6.7
                mysql database

                over a flat 1Gb network

                the first node is running for over a year now, idem for the master node.
                but I expect that that may not be of any difference.

                Comment

                • data7
                  Junior Member
                  • May 2008
                  • 18

                  #9
                  Solution and a short story...

                  stage, I already had this problem, which was solved on this thread thanks to Norbert.

                  Simply log into your zabbix database and create an index with:

                  Code:
                  create index node_cksum_index_2 on node_cksum (nodeid, cksumtype);
                  Synchronizations that took over an hour (mostly the first ones) finished on 5 minutes after that

                  It looks like these patches never got into production release sadly.

                  In addition, Palmertree's great suggestions also seem to have been forgotten. I'm still stuck with all these deadlocks and occasionally corruption on nodes occurs, forcing me to delete all content related to some node on node_cksum table. Also Housekeeper process tends decrease master node's performance very much. The bad side is that I can't set it to more than default 1 hour or my database size grows abnormally in no time.

                  Nelsonab, in my case I have a nice master node (tweaked MySQL database, 18GB RAM, etc.) with 8 slaves, on which 3 (large ones) normally spend a minute during configuration changes check. In the end it takes ~7 minutes to update any node's data.

                  When zabbix Firefox plugin start supporting multiple GUIs I'll definitely cut off from production the distributed monitoring system since my only intention is to have all child's alerts on one screen in the smallest possible interval. I would rather increase configuration checks interval to an hour on every child node so that my master become only an history and event centralizer.

                  Sorry for the large post, but I had to say it

                  Edit: stage, I strongly recommend an update to 1.6.8 as 1.6.7 has known bug (search the forums) where housekeeper doesn't clean all history data properly.
                  Last edited by data7; 07-01-2010, 20:08. Reason: Missing 1.6.7 info

                  Comment

                  Working...