Ad Widget

Collapse

MySQL server has gone away

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • befortin
    Member
    • Jul 2005
    • 48

    #16
    I just found out that the things in my /var/log/messages are caused by /etc/cron.daily/sysklogd. However, it's strange that MySQL crashes a few seconds later...

    Comment

    • xming
      Junior Member
      • Jul 2006
      • 3

      #17
      Do you use logrotate to rotate your mysql logs? If you do try to diable that, it might be that the logs are moved and mysqld can't find the files

      Comment

      • befortin
        Member
        • Jul 2005
        • 48

        #18
        xming : Yay!! I just removed the files from cron.daily and it didn't crash. Then I ran the logrotate script and it crashed!! Thank you very much!

        I guess that some of the guys having the same problem would resolve their problem the same way.

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #19
          Originally posted by schneck
          What the ZABBIX developers can do:
          * make zabbix_server recover gracefully from database failures (ie, wait a few secs and reconnect after failure)
          has there been any progress on this ? 1.1.2 still silently quits if mysql server is unavailable for a moment.

          writing shell scripts that monitor a monitoring server process... seems strange
          Zabbix 3.0 Network Monitoring book

          Comment

          • Tractor
            Junior Member
            • Sep 2005
            • 13

            #20
            Zabbix shuts down after mysql goes away

            This Zabbix problem is lasting at least from 2004. It was reported but it is still not solved. Actually, this issue is fairly hard to any Zabbix user, especially when Zabbix is used for high availability monitoring. Correctly working monitoring software should buffer all the data and when SQL server becomes available should put all the temporary stored values into data storage. So, this issue still prevents use of Zabbix in our company. Sorry, Alexey, we regularly test Zabbix, but still use Nagios - it is simply reliability issue.

            002934:20061027:161137 Query::select hostid from hosts where host='localhost'
            002934:20061027:161137 Query failed:MySQL server has gone away [2006]
            002928:20061027:161137 One server process died. Shutting down...
            002928:20061027:161137 ZABBIX server is down.

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #21
              Originally posted by Tractor
              This Zabbix problem is lasting at least from 2004. It was reported but it is still not solved. Actually, this issue is fairly hard to any Zabbix user, especially when Zabbix is used for high availability monitoring. Correctly working monitoring software should buffer all the data and when SQL server becomes available should put all the temporary stored values into data storage. So, this issue still prevents use of Zabbix in our company. Sorry, Alexey, we regularly test Zabbix, but still use Nagios - it is simply reliability issue.

              002934:20061027:161137 Query::select hostid from hosts where host='localhost'
              002934:20061027:161137 Query failed:MySQL server has gone away [2006]
              002928:20061027:161137 One server process died. Shutting down...
              002928:20061027:161137 ZABBIX server is down.
              What problem?! Unreliability of your database (power supply, disk storage, whatever) is not ZABBIX problem!

              Well, it is very simple. ZABBIX does rely on database as most (all?) of mission critical software around. Can SAP, HP OV, banking systems, your company's WEB site, you name it, work without database by "buffering data" somewhere? I doubt so.

              Can Nagios or whatever software work if I come and unmount Nagios partition where it keeps all the data? Will it buffer data in memory? I doubt so.

              Make your database reliable and use ZABBIX...
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • Tractor
                Junior Member
                • Sep 2005
                • 13

                #22
                Alexei, please do not consider this as attack to Zabbix - personally I like your software and I check status of Zabbix development regularly and we do Zabbix installs twice a year hoping that it will pass our requirements... I want to migrate our monitoring to your software. But I can't do this because of reliability.

                Zabbix at least it SHOULD reconnect to mysql instead of shutting off. And if you want good monitoring, it SHOULD buffer received data. Comparison with Nagios may be incorrect because Nagios is not intended to high availability monitoring and statistics at all. For good example, see commercial monitoring products like Nimbus, which can work and collect all the data without database for weeks or even months. Buffering and reconnection issue is also much wider: when you have distributed monitoring system, data should be somewhere stored for periods of time when network is unavailable or server is maintained, etc., so buffering system is needed anyway. And also - monitoring software SHOULD BE MORE RELIABLE than any monitored device or service and because of this reason it SHOULD compensate data storage issues, and SHOULD restore monitoring automatically, as soon as possible, so this is not question about SQL server reliability, but question about MONITORING SOFTWARE capability to do high availability monitoring.

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #23
                  I apologise if I was too straight in my previous post. I didn't realise that you had completely different reasoning to your post far from people moaning "my database is down, why ZABBIX stopped?".

                  Anyway I'm very interested in hearing any ideas or real live experience and requirements which would make the product better. I'm open to serious dialogue either here or privately. High availability and distributed monitoring is something we are trying to achieve in next stable release, ZABBIX 1.4.

                  I also fully agree that ZABBIX must not silently die in case if database is down. I believe that the best thing ZABBIX could do in this case is:

                  - notify administrators about lack of database connectivity
                  - try to reconnect to the database

                  Note that functionality of ZABBIX depends on availability of a back-end database very much, so it is nearly impossible to make ZABBIX work in no-database mode. Neither frontend, nor triggers, actions, polling, whatever can work without database.

                  As for buffering, it is already implemented in SVN code on per-node level. It means that nodes can work independently of communications and when comms are back the information is transferred automatically without any manual work.

                  Please keep you interest in ZABBIX and feel free to talk to me if needed
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • richlv
                    Senior Member
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Oct 2005
                    • 3112

                    #24
                    "I believe that the best thing ZABBIX could do in this case is:

                    - notify administrators about lack of database connectivity
                    - try to reconnect to the database"

                    indeed. i suppose this would cover most cases and complaints about this problem would wanish

                    "As for buffering, it is already implemented in SVN code on per-node level. It means that nodes can work independently of communications and when comms are back the information is transferred automatically without any manual work."

                    that covers even more of the problem.
                    now, having server buffer some data when db is unavailable would also cover snmp trap information received during that period.
                    i suppose even if first part (server reconnecting to the db), data sent by clients during that time would be lost, unless rejected and buffered at the client. this has the potential of overloading clients, but i'd guess most users would like to keep client load as minimal as possible, choosing overloading monitoring server instead of clients.
                    Zabbix 3.0 Network Monitoring book

                    Comment

                    • Tractor
                      Junior Member
                      • Sep 2005
                      • 13

                      #25
                      Hi, Alexei, it is good that we found common understanding :-)

                      I had used many of different monitoring systems in large environments and I will be glad to help you to develop something what will be able to compete with commercial products :-)

                      Actually, there can be partial workaraund to current situation with server shutdown which we have employed on our testing system - there can be simply any daemon control system which can restart zabbix_server process automagically, like daemontools working on most unices, or SMF on Solaris, etc.. This can be a better when comparing to service starting scripts used by cron because of delay will be seconds instead of minutes. Anyway, this is not solution, because data and agent control can be lost, especially in big deployments.

                      Of course I understand that buffering will require some rewrite of server and agents. I see that it can be solved by doing two things:
                      1. Pre-SQL buffer to store inserts - it can be some additional process which can be used instead of direct connection to mysql or simply it can be text file which can be used to store data. This can be implemented without deep changes, like some hack. But the best way would be some transaction system, when agent sends data to server and server not only puts that data into DB but also informs agent that data is stored and can be deleted from agent buffer now.
                      2. Configuration cache - it will be needed for agents and for server in case of SQL server becomes inaccessible. Maybe it can be done as some hack to server and agents?

                      And one other thing: you've mentioned something called SVN and per-node buffering but I was unable to find anything about that. Can you point me to right place to read?

                      Comment

                      • Alexei
                        Founder, CEO
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Sep 2004
                        • 5654

                        #26
                        Originally posted by Tractor
                        And one other thing: you've mentioned something called SVN and per-node buffering but I was unable to find anything about that. Can you point me to right place to read?
                        SVN is our internal Subversion server we use for ZABBIX development. Currently it has no public access. Distributed monitoring is part of the next beta release (1.3). Documentation will appear on our WEB site prior to official release of the beta (next week?).
                        Alexei Vladishev
                        Creator of Zabbix, Product manager
                        New York | Tokyo | Riga
                        My Twitter

                        Comment

                        • overrider
                          Member
                          • Oct 2006
                          • 36

                          #27
                          same here

                          Hi,

                          i have the same Issue. Zabbix is not on the same server as the mysql database, so sometimes when there is a connection issue, just for a second, zabbix dies. I think zabbix should not die, but wait 30 seconds and try to connect again. in the meantime i made this

                          Code:
                          #!/usr/bin/perl -w
                          
                          use strict;
                          
                          my $command = "/bin/ps -ax | /usr/bin/grep zabbix_server | /usr/bin/grep -v grep";
                          my $mail = "/bin/echo ALERT | /usr/bin/mail -s \"Zabbix isnt running\" overrider\@domain.com";
                          my $restart_zabbix = "/usr/local/etc/rc.d/zabbix_server start";
                          
                          if ( (system $command) ne '0' ) {
                                  print "Zabbix aint running so im gonna restart it\n";
                                  system $mail;
                                  system $restart_zabbix;
                          } else {
                                  print "Zabbix is running\n";
                          }
                          which i run from a crontab, if the zabbix server isnt there, it will restart it.

                          Comment

                          • Tractor
                            Junior Member
                            • Sep 2005
                            • 13

                            #28
                            overrider, try daemontools on linux or bsd, smf on solaris or similar service management systems - you will get restarting in seconds instead of minutes.

                            Comment

                            • Alexei
                              Founder, CEO
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Sep 2004
                              • 5654

                              #29
                              The issue (unavailability of DB, ZABBIX stops) will be addressed in ZABBIX 1.4. I won't release it in 1.1.x.
                              Alexei Vladishev
                              Creator of Zabbix, Product manager
                              New York | Tokyo | Riga
                              My Twitter

                              Comment

                              • gmarsal
                                Junior Member
                                • Nov 2006
                                • 1

                                #30
                                I had the same problem : zabbix_server going down every day at 6:25 AM. 6:25 is when my cron runs the logrotate process. It seems that mysql restarts or don't accept connections during logs rotation. To correct the problem i have created a new file : /etc/logrotate.d/zabbix-server :

                                /var/log/zabbix/zabbix_server.log {
                                daily
                                compress
                                ifempty
                                postrotate
                                if [ ! -f /var/run/zabbix/zabbix_server.pid ]; then
                                sleep 30
                                /etc/init.d/zabbix_server start
                                fi
                                endscript
                                }



                                (Please adapt paths and file names to your installation)

                                Comment

                                Working...