Ad Widget

Collapse

[1.4.4] zabbix_server doesn't crash, but no longer collects data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sdwilders
    Member
    • Feb 2008
    • 33

    #31
    I currently have both the server and agent running with logging set to debug level 4. As soon as it dies I will email these over to you Alexei.

    Thanks for your help with this.

    Comment

    • bbrendon
      Senior Member
      • Sep 2005
      • 870

      #32
      Any updates on this? This is my favorite thread on the internet ...
      Unofficial Zabbix Expert
      Blog, Corporate Site

      Comment

      • sdwilders
        Member
        • Feb 2008
        • 33

        #33
        Server stopped collecting sometime after midnight - by looking at the latest data it apears to be sometime between 00:10 and 00:20 when it died.

        Took a little longer to get a log file than expected because the first time round it used up all the space in my /tmp partition! The raw log file is 2GB! I have tarred the file but obviously can't email it because its still 125MB, so I've emailed you a link to download it Alexei. Don't envy you looking through a log so large.

        Hopefully we can now work out what is wrong as reading the forums there seems to be quite a few people with a similar problem.

        Comment

        • sdwilders
          Member
          • Feb 2008
          • 33

          #34
          Alexei repled:

          Thanks for the log files.

          Unfortunately I do not see nothing wrong in the log. It seems that ZABBIX was killed, not crashed.

          It was stopped exactly at 07:20:00 am. It is very suspicious to me!
          Please check your system, it seems that some periodic (?) process killed ZABBIX.


          Problem is that the zabbix server is still running (I can see several zabbix processes by doing ps aux at the command line). Any other ideas?

          Comment

          • sdwilders
            Member
            • Feb 2008
            • 33

            #35
            Sorry, I just had a thought. 7:20 is when I restarted the service myself to get it going again. It stopped collecting data around midnight.

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #36
              Originally posted by sdwilders
              Sorry, I just had a thought. 7:20 is when I restarted the service myself to get it going again. It stopped collecting data around midnight.
              Argh, good to know... I think I know what's going on. A patch will be created soon and release of 1.4.5 and 1.5.1 is on the way.
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • sdwilders
                Member
                • Feb 2008
                • 33

                #37
                I'm assuming this means you found something in the logs? If so, I for one and infinity005 will be very happy

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #38
                  Originally posted by sdwilders
                  I'm assuming this means you found something in the logs? If so, I for one and infinity005 will be very happy
                  Yes, I found something! Actually this was an known problem, which suddenly came up. The problem affects all ZABBIX systems, especially those with heavy use of active checks for monitoring of unreliable networks and remote locations.
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • bbrendon
                    Senior Member
                    • Sep 2005
                    • 870

                    #39
                    Originally posted by Alexei
                    Yes, I found something! Actually this was an known problem, which suddenly came up. The problem affects all ZABBIX systems, especially those with heavy use of active checks for monitoring of unreliable networks and remote locations.
                    Sounds like it applies to me 100%!! Please attach the patch to this thread, I need it ASAP, I don't even care if it hasn't been tested!!
                    Unofficial Zabbix Expert
                    Blog, Corporate Site

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #40
                      The patch is attached for your convenience. It has been tested!
                      Attached Files
                      Last edited by Alexei; 26-03-2008, 10:21. Reason: The patch was corrected.
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      • bbrendon
                        Senior Member
                        • Sep 2005
                        • 870

                        #41
                        Check out ZBX-343 in svn. I think its the patch.

                        Installed it on my 1.4.4 setup. FINGERS CROSSED!!
                        Last edited by bbrendon; 25-03-2008, 19:08.
                        Unofficial Zabbix Expert
                        Blog, Corporate Site

                        Comment

                        • sdwilders
                          Member
                          • Feb 2008
                          • 33

                          #42
                          I was looking at ZBX-323.

                          Can you tell me how to install a patch?

                          Comment

                          • sdwilders
                            Member
                            • Feb 2008
                            • 33

                            #43
                            I've managed to install the patch.

                            I'll let you know how it goes - but I too will sit with my fingers crossed. If it makes it through the night I'll be happy, probably won't be convinced until its run for a month though

                            Comment

                            • Alexei
                              Founder, CEO
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Sep 2004
                              • 5654

                              #44
                              No need to wait a month. Feel free to keep me updated every new week of ZABBIX uptime
                              Alexei Vladishev
                              Creator of Zabbix, Product manager
                              New York | Tokyo | Riga
                              My Twitter

                              Comment

                              • bbrendon
                                Senior Member
                                • Sep 2005
                                • 870

                                #45
                                My zabbix server just crashed.

                                Its 1.4.4 + load patch and trapper patch. It was fine with the load patch, except that it would suddenly hang. The trapper patch causes it to crash.

                                I'm upgrading to 1.4.5 and bumping debugging back up to 4. I have the 1.4.4 + patch, debug=3 log if you're interested with summary below:

                                Code:
                                # tail zabbix_server.log.crash1.144_patched 
                                ] is not suitable for [[email protected]]
                                 32331:20080325:150003 Expression [{21451}>50] cannot be evaluated [Unable to get value for functionid [21451]]
                                 32331:20080325:150003 Expression [{21456}>50] cannot be evaluated [Unable to get value for functionid [21456]]
                                 32331:20080325:150003 Expression [{21457}>50] cannot be evaluated [Unable to get value for functionid [21457]]
                                 32322:20080325:150020 Active parameter [system.run[mysqladmin --defaults-file=/etc/zabbix/agent.mycnf status|cut -f4 -d":"|cut -f1 -d"S"]] is not supported by agent on host [arts.web2]
                                 32319:20080325:150021 One child process died. Exiting ...
                                 32319:20080325:150023 ZABBIX Server stopped
                                
                                # grep 32319 zabbix_server.log.crash1.144_patched 
                                 32319:20080325:150021 One child process died. Exiting ...
                                 32319:20080325:150023 ZABBIX Server stopped
                                #
                                Last edited by bbrendon; 26-03-2008, 01:14.
                                Unofficial Zabbix Expert
                                Blog, Corporate Site

                                Comment

                                Working...