Ad Widget

Collapse

Zabbix server crashing on node sync

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Niels
    Senior Member
    • May 2007
    • 239

    #1

    Zabbix server crashing on node sync

    Yesterday I spent several hours troubleshooting and eventually trying to salvage my installation. We are not amused.

    The important part of my installation is my main Zabbix server (MZS) and a remote node (RN). This setup has been running just fine for months now, but after I added a host to RN, the MZS started crashing. Apparently the RN had picked up some data that the MZS couldn't stomach.

    Logs on RN and MZS gave no insights into the particular cause, because at DebugLevel==3, nothing important gets logged and with DebugLevel==4, far too much gets logged. No discernable errors or suspiciously looking strings turned up with DebugLevel==4. I did not save any logs, and I'm not going to provoke the same error again.

    Both servers were running 1.4.5, and downgrading the MZS to 1.4.3 made no difference. The host I added to the RN was a Cisco Soho 91, with the standard snmp2 template. Both machines are running Debian 4.0, the DB is Mysql and communication is through an SSH tunnel.

    I eventually truncated all the history tables on the RN, and that worked. But I lost data, obviously. Cleaning history for the host's items wasn't enough. It would be very, very nice to have features like "start housekeeping", "truncate all history", "Sync node A to node B" and more along those lines.

    So now I have two big problems: I can't monitor my remote Cisco host, and I'm basically afraid of adding anything new to the system. That's pretty bad.

    Zabbix developers:

    - Please comment on the above.
    - Please tell me how you're going to ensure that this wont occur again.
    - Please tell me what you're going to do _right now_.
  • xs-
    Senior Member
    Zabbix Certified Specialist
    • Dec 2007
    • 393

    #2
    I've had a similar issue not too long ago.

    If a remote node crashes during an update, it will also lock up the node it's sending to (issue still exists).
    But, the reason the remote node crashed in the first place was due to an sql query which mysql4 didnt like.
    If you have any nodes running on mysql4, try upgrading them to mysql5 and check if the problem remains.

    Comment

    • ptietjens
      Junior Member
      • May 2008
      • 5

      #3
      Not running any trace of mysql4

      Invalid data from a node has also b0rk3d my Zabbix server. Running Ubuntu 8.04 servers in the testing environment with no trace of mysql lower than 5.

      Comment

      • vinny
        Senior Member
        • Jan 2008
        • 145

        #4
        I think that some data cast would help solve the problem...'cause zabbix is often trying to update/insert data in history that do not match with history column data types...
        this will reduce the errors...what do u think ?
        -------
        Zabbix 1.8.3, 1200+ Hosts, 40 000+ Items...zabbix's everywhere

        Comment

        • cstackpole
          Senior Member
          Zabbix Certified Specialist
          • Oct 2006
          • 225

          #5
          "Logs on RN and MZS gave no insights into the particular cause, because at DebugLevel==3, nothing important gets logged and with DebugLevel==4, far too much gets logged. No discernable errors or suspiciously looking strings turned up with DebugLevel==4. I did not save any logs, and I'm not going to provoke the same error again.
          ...
          So now I have two big problems: I can't monitor my remote Cisco host, and I'm basically afraid of adding anything new to the system. That's pretty bad"
          - Niels

          I know exactly what you mean. I have an error that creeps up randomly about 3 times a month. I have absolutly no way of catching what is going on without setting the debug level up way high. At the same time I can't leave the debug that high for that long of a period. This has been mentioned many times before, but I have not seen a solution.

          I also sat in the same position as you; afraid to add anything new and unable to monitor things needed. My solution was to build a completly independent secondary system for testing. Every patch, every upgrade, every system gets added to it first. I even test out new items for monitoring on it first.

          It is kinda of scary when I think about it. Zabbix does so much for me that I completly depend on it...but for me that is a good thing. At the same time, if I really do depend on it then I have to know how it will react and the best way of testing that is on a secondary system. But that is me and it isn't the best solution for everyone.

          vinny,
          I agree. I have seen this personally and it has been mentioned many times on the forums.

          I am kind of curious what Alexeis take on this is.

          Comment

          • Niels
            Senior Member
            • May 2007
            • 239

            #6
            Originally posted by cstackpole
            I am kind of curious what Alexeis take on this is.
            In my original post from 14-05-2008 08:44 I asked the developers for comments. There's really no need to be curious any more, we got our answer.

            Comment

            Working...