Ad Widget

Collapse

DM New master node, sync issues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • xs-
    Senior Member
    Zabbix Certified Specialist
    • Dec 2007
    • 393

    #1

    DM New master node, sync issues

    Hi all,

    Think i've found a bug in distributed monitoring synchronization, below is the scenario + error.
    Env:
    Both zabbix servers are mostly identical
    - Version 1.6.0 (not trunk)
    - MySQL 5
    - Server binary is identical, copy of the same build.
    - Ubuntu 8 tls

    I've just set up a new 'master' node, to eventually link several stand alone zabbix nodes.
    This masternode will not get any hosts of its own, just combine all child node information for viewing and alerting.

    After setting up the new master node, and emptying everything (hosts/templates/triggers,actions, etc, etc), i configured it to have 1 child node. This child node is an existing zabbix node, with around 600 hosts.
    Then i configured the child node, restarted both zabbix_server daemons (without restart, nothing really happens, sync-wise).

    After some time, configuration updates start (tailing the logfile).
    So far, the following updates on the master side:
    - 1 successfull
    - 1 timeout
    - 3 query errors


    Query errors are all the same:
    7502:20080929:152555 NODE 10: Received data from slave node 1 for node 1 datalen 28130879
    7502:20080929:153055 Timeout while answering request
    7502:20080929:154426 Query failed: [insert into hosts_profiles_ext (hostid,hostid,device_alias,device_type,device_cha ssis,device_os,device_os_short,device_hw_arch,devi ce_serial,device_model,device_tag,device_vendor,de vice_contract,device_who,device_status,device_app_ 01,device_app_02,device_app_03,device_app_04,devic e_app_05,device_url_1,device_url_2,device_url_3,de vice_networks,device_notes,device_hardware,device_ software,ip_subnet_mask,ip_router,ip_macaddress,oo b_ip,oob_subnet_mask,oob_router,date_hw_buy,date_h w_install,date_hw_expiry,date_hw_decomm,site_stree t_1,site_street_2,site_street_3,site_city,site_sta te,site_country,site_zip,site_rack,site_notes,poc_ 1_name,poc_1_email,poc_1_phone_1,poc_1_phone_2,poc _1_cell,poc_1_screen,poc_1_notes,poc_2_name,poc_2_ email,poc_2_phone_1,poc_2_phone_2,poc_2_cell,poc_2 _screen,poc_2_notes) values(100100000010741,100100000010741,'','','','' ,'','','','','','','','','','','','','','','','',' ','','','','','','','','','','','','','','','','', '','','','','','','','','','','','','','','','','' ,'','','','')] Column 'hostid' specified twice [1110]
    7502:20080929:154426 Query failed: [insert into hosts_profiles_ext (hostid,hostid,device_alias,device_type,device_cha ssis,device_os,device_os_short,device_hw_arch,devi ce_serial,device_model,device_tag,device_vendor,de vice_contract,device_who,device_status,device_app_ 01,device_app_02,device_app_03,device_app_04,devic e_app_05,device_url_1,device_url_2,device_url_3,de vice_networks,device_notes,device_hardware,device_ software,ip_subnet_mask,ip_router,ip_macaddress,oo b_ip,oob_subnet_mask,oob_router,date_hw_buy,date_h w_install,date_hw_expiry,date_hw_decomm,site_stree t_1,site_street_2,site_street_3,site_city,site_sta te,site_country,site_zip,site_rack,site_notes,poc_ 1_name,poc_1_email,poc_1_phone_1,poc_1_phone_2,poc _1_cell,poc_1_screen,poc_1_notes,poc_2_name,poc_2_ email,poc_2_phone_1,poc_2_phone_2,poc_2_cell,poc_2 _screen,poc_2_notes) values(100100000010767,100100000010767,'','','','' ,'','','','','','','','','','','','','','','','',' ','','','','','','','','','','','','','','','','', '','','','','','','','','','','','','','','','','' ,'','','','')] Column 'hostid' specified twice [1110]
    7502:20080929:154426 Query failed: [insert into hosts_profiles_ext (hostid,hostid,device_alias,device_type,device_cha ssis,device_os,device_os_short,device_hw_arch,devi ce_serial,device_model,device_tag,device_vendor,de vice_contract,device_who,device_status,device_app_ 01,device_app_02,device_app_03,device_app_04,devic e_app_05,device_url_1,device_url_2,device_url_3,de vice_networks,device_notes,device_hardware,device_ software,ip_subnet_mask,ip_router,ip_macaddress,oo b_ip,oob_subnet_mask,oob_router,date_hw_buy,date_h w_install,date_hw_expiry,date_hw_decomm,site_stree t_1,site_street_2,site_street_3,site_city,site_sta te,site_country,site_zip,site_rack,site_notes,poc_ 1_name,poc_1_email,poc_1_phone_1,poc_1_phone_2,poc _1_cell,poc_1_screen,poc_1_notes,poc_2_name,poc_2_ email,poc_2_phone_1,poc_2_phone_2,poc_2_cell,poc_2 _screen,poc_2_notes) values(100100000010768,100100000010768,'','','','' ,'','','','','','','','','','','','','','','','',' ','','','','','','','','','','','','','','','','', '','','','','','','','','','','','','','','','','' ,'','','','')] Column 'hostid' specified twice [1110]
  • thissolution
    Junior Member
    • Jun 2008
    • 7

    #2
    G'day

    I am having the same issue. From your log, yours is trying to send 28Mb across to the other node, and I am guessing your 2 nodes are not on a LAN together. I have tired over and over again, and getting the same issue, even if i dont add any hosts or new templates, just the first time it tries to sync, its trying to send across 4.1Mb! Thats a lot of data considering both servers are brand new setups, and nothing added (other then setting up the nodes in, the admin section).

    Is no one else trying to use DM in 1.6, where the 2 nodes are not on a LAN next to each other?

    Paul

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Registered as ZBX-540.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • stalker
        Junior Member
        • Aug 2008
        • 29

        #4
        Possible this is timeout problem. How to change timeouts?

        Links between my nodes is 2mbps and after add new switch to slave from master node i repeatedly see on master node:

        Code:
        16675:20081009:130543 Timeout while answering request
         16675:20081009:130543 NODE 1: Error while receiving answer from Node [2] error: ZBX_TCP_READ() failed [Interrupted system call]
        and on slave node repeated:
        Code:
        13537:20081009:130217 NODE 2: Received data from master node 1 for node 2 datalen 1944392
        On slave node after recieving this 1944392 bytes mysql 20-30min eat 100% cpu and again by timeout recieved it. Perpetuum mobile

        With initial sync node1 recieved 4Mb data and transforms to same perpetuum mobile.

        How to change timeouts?

        Comment

        • teferi
          Member
          • Jul 2008
          • 93

          #5
          Originally posted by stalker
          Possible this is timeout problem. How to change timeouts?
          Timeout variable in your conf file.

          Comment

          • thissolution
            Junior Member
            • Jun 2008
            • 7

            #6
            Teferi

            But the variable has a max of only 30 seconds, and syncing 4-19Mb over a ADSL2 takes a lot longer then 30 seconds.

            Also, how do we track to see when bug ZBX-540 is fixed?

            Thanks
            Paul

            Comment

            • teferi
              Member
              • Jul 2008
              • 93

              #7
              Originally posted by thissolution
              Teferi

              But the variable has a max of only 30 seconds, and syncing 4-19Mb over a ADSL2 takes a lot longer then 30 seconds.

              Also, how do we track to see when bug ZBX-540 is fixed?

              Thanks
              Paul
              About timeout - well you may propably want to hack into code to rise the limit.

              About bug:

              Comment

              • thissolution
                Junior Member
                • Jun 2008
                • 7

                #8
                Alex

                I have tried build 6180, but the same issue happens. I have posted this, and the log on the bug report on the 15/10 - but have had no correspondence from the zabbix team.

                Paul

                Comment

                • thissolution
                  Junior Member
                  • Jun 2008
                  • 7

                  #9
                  Hi

                  Is there any update on this bug? I am not sure how anyone could use this in a wan based DM setup with this issue at hand?

                  Paul

                  Comment

                  • xs-
                    Senior Member
                    Zabbix Certified Specialist
                    • Dec 2007
                    • 393

                    #10
                    this specific bug has been fixed in the 1.6 branch in svn (or get the nightly build).

                    Comment

                    Working...