Ad Widget

Collapse

Master / Remote Servers out of Sync

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • morgan
    Junior Member
    • Jul 2007
    • 24

    #1

    Master / Remote Servers out of Sync

    It appears that my master is now out of sync with the remote server.

    I am having an issue where the remote server does not delete ANYTHING the master is instructing it to delete. However, it is adding thing without issue. This poses a problem in that I cannot reliably make a change to the master and see it push out to the slaves.

    Any advice?

    1.4.1 -- Was working then it "just stopped".

    Time is within miliseconds (at most) between servers. Sync'd to the same stratum 2 time server.

    I am seeing the logs indicating the config changes are pushed to the slave, it appears they are just not going through.

    Is there an "ACK" sent back to the central server indicating it has deleted or does it just simply (and blindly) delete and hope it works? Is that ACK sent after the delete occurs if there is one?

    Thanks,

    --Morgan
    Last edited by morgan; 15-07-2007, 02:48.
  • morgan
    Junior Member
    • Jul 2007
    • 24

    #2
    Doing further research I am finding the following when deleting items (consistantly):

    1) It was never working
    2) Logs are showing this --
    19835:20070714:173521 Cannot select (null) from table [(null)]
    19835:20070714:173521 NODE 1: Sending configuration changes of node 3 to node 3 datalen 8
    19835:20070714:173521 Data [Data?1?3]

    This is basically showing me that the zabbix_server process is losing track of the items to be deleted prior to them actually being deleted. From what I can tell (and this is not 100% sure) the node_configlog isn't being set with the proper values by the front end php to handle the delete properly(??this is a guess, since I am not 100% sure where the delete calls are being pulled from because -- well, I haven't seen one due to this bug) or the data is stricken from the database but leaves a stub data, so when the "update" is sent to the remote node the data string is obviously invalid.

    I'll do some more research as to what exactly is going on. However, this is something that seems untested -- further QA might have shown this to be a bug pre-release.

    I'll post another patch once I figure out what is missing and broken. I simply want to get a solid monitoring solution in place side-by-side of the one I'm working to replace, and these bugs are ~showstoppers~. Adding from the Master but having to delete from BOTH is not proper.

    --Morgan

    Side Note: I wish there was an easy to use API that I could script adding all the hosts in and be sure it propagated. Another thing to add to the wishlist.
    Last edited by morgan; 15-07-2007, 02:48.

    Comment

    • morgan
      Junior Member
      • Jul 2007
      • 24

      #3
      It also appears that if you have (potentially) multiple people working on the Zabbix server the ability to Sync the config from the Remote node to the Central (master) node has potential to cause conflicting/duplicate key values in the MySQL database --

      Coupled with that problem, it appears that the addition of triggers via the php front end ALSO has potential to cause a situation where the configuration in the database is essentially broken, causing the
      5979:20070714:193151 Cannot select (null) from table [(null)]
      5979:20070714:193151 Cannot select (null) from table [(null)]
      5979:20070714:193151 Cannot select (null) from table [(null)]
      5979:20070714:193151 NODE 1: Sending configuration changes of node 2 to node 2 datalen 26629

      type bug to become prevalent, making any associated changed from the master node fail to push to the slave. When this happens the Slave node appears to push it's config back to the master each time. While this is all and good to make sure you have a "good" config, I'm trying to figure out what is wholly broken in my config that is preventing the master from sending a good data set to the remote node. Each time the push back from the slave is overwriting the addition of a valid dependency for a trigger.

      Any insight as to why this string of bugs may be happening would be appreciated. When I can pick this a apart a bit more I'll post further. I believe that I don't have a broken setup/config from the setup standpoint, just bad data has managed to make it's way into the configuration within the database.

      Comment

      • morgan
        Junior Member
        • Jul 2007
        • 24

        #4
        and now it just works again.

        I don't get it. I'm going to add some more debugging to try an pin it down if it is re-surfaces.

        The Issue of deletes NOT making it to the remote node has not been fixed, the other odd issues have been.

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Please can you post complete list of DM related problems you currently have? You said that some of the problem were fixed. How? Automagically?
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • morgan
            Junior Member
            • Jul 2007
            • 24

            #6
            It appears that I had some "Bad" data hidden in the database that somehow got in there. I cleared some of the tables (by hand) with duplicate entries and the sync started back up.

            Here is a definitive list that I've run across in the DM configuration. I'll also comment on the severity of the problem from usability standpoint in my configuration.

            1) Delete's do not occur on the Remote server when the Master has a delete done (and vice versa). This appears to be due to a lack of entry placed in the node_configlog (assumption due to the fact that I don't see how else it would propagate in a "timed" fashion) when the DELETE is done from the UI. You get an error indicating that the data is missing as per below:
            19835:20070714:173521 Cannot select (null) from table [(null)]
            19835:20070714:173521 NODE 1: Sending configuration changes of node 3 to node 3 datalen 8
            19835:20070714:173521 Data [Data?1?3]

            The UI only does a "delete" from table, with no sync data propagated. node_configlog looks to be mostly "unused" in the current 1.4.1 php frontend.

            This is fairly severe as it breaks the sync and causes a fairly large amount of work to fully remove the data, especially from remote (firewalled) nodes.

            2) There appears to be a consistent issue with duplicate entries being added to non-Unique tables (such as the hosts_groups) table in the remote servers (and masters in some cases) when deletes/adds/changes are done en masse. Basically, the entries will be duplicated with new hostgroupids. This only appears when viewing the templates/hosts with a "group" specified.

            This is cosmetic in nature, unless it can cause more far-reaching duplications. I think I have seen dupes on the actual hosts, but due to the afore-mentioned delete bug, I cannot see if this is a delete-bug or it's own unique bug. It does appear to be unique because not "everything" is duplicated.

            3) The UI consistently (when making updates to items, triggers, etc) tries to insert "new" values into the database vs. doing updates resulting in "duplicate" key errors from the mysql connection in the UI. This isn't always causing an issue, but in limited cases it has forced me to do multiple updates to get the update correct.

            This is moderately annoying, but not terrible.

            4) The Sync from the Remote to the Master appears to have the potential to cause a configuration mismatch if changes are performed in both locations. The first Sync tends to cause the second to be overwritten. Though I cannot duplicate this consistently, I have had someone else working on the Zabbix configuration and the update has caused other changes to be over-written. Perhaps there should be an option that enforces a simple sync of config from the Master node to ensure the config is "correct" vs. the two-way sync method that is exclusively in place.

            This basically means I need to restrict the usage of the UI to the master node and rely on the sync between the nodes to handle the config so that nothing is over-written.

            ---

            This is the currently active list of bugs I've run across in the 1.4.1 DM configuration (I have tried 1.4.2 and confirmed that the major bugs were the same (delete issue). However, since 1.4.2 is essentially a nightly SVN I'm holding off on using it until release.
            Last edited by morgan; 18-07-2007, 02:42.

            Comment

            • Niels
              Senior Member
              • May 2007
              • 239

              #7
              I really appreciate your posting all this -- thanks a lot!!!

              Comment

              • morgan
                Junior Member
                • Jul 2007
                • 24

                #8
                Originally posted by Niels
                I really appreciate your posting all this -- thanks a lot!!!
                Happy to help, especially if it's helping others as well. However, I am hoping to see (at the least) a good fix for the delete bug soon because it's the biggest show-stopper from me being able to reliably push Zabbix into production monitoring (beyond some of the other changes that comes from moving from one system to another).

                I'll say that until the delete bug is fixed, be careful in editing distributed configurations -- If you're not careful it is possible to get the nodes so far out of sync they will no longer update properly, which requires a wipe of the slave and reimport via the config sync (not exactly ideal). If you're careful and make the exact same changes from both nodes you'll be "ok" . My php skills are (very) rusty or I'd take a swing at fixing the UI Delete bug (among others).
                Last edited by morgan; 18-07-2007, 11:38.

                Comment

                • morgan
                  Junior Member
                  • Jul 2007
                  • 24

                  #9
                  Alexei,

                  Would it be possible to get an idea of what it will take to get at the least the Delete bug taken care of? I'm pushing up against a deadline to roll out a new monitoring core -- everyone here loves Zabbix, but with some of these issues I'm going to have to go with an alternate solution that allows me to run distributed without having to worry about propagation.

                  Cheers,

                  Morgan

                  Comment

                  • Alexei
                    Founder, CEO
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Sep 2004
                    • 5654

                    #10
                    Morgan,

                    I consider all reported issues very seriously. We are working hard on fixing reported bugs. I expect that most critical DM related issues will be fixed within next week (perhaps earlier) and will be available in pre 1.4.2 for early testing. ZABBIX 1.4.2 will be officially released in early August if everything will be fine.
                    Alexei Vladishev
                    Creator of Zabbix, Product manager
                    New York | Tokyo | Riga
                    My Twitter

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #11
                      The reported problems have been fixed.

                      Please try the latest pre 1.4.2 and report back if it works for you. Thank you.
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      Working...