I just found out that the things in my /var/log/messages are caused by /etc/cron.daily/sysklogd. However, it's strange that MySQL crashes a few seconds later...
Ad Widget
Collapse
MySQL server has gone away
Collapse
X
-
has there been any progress on this ? 1.1.2 still silently quits if mysql server is unavailable for a moment.Originally posted by schneckWhat the ZABBIX developers can do:
* make zabbix_server recover gracefully from database failures (ie, wait a few secs and reconnect after failure)
writing shell scripts that monitor a monitoring server process... seems strange
Comment
-
Zabbix shuts down after mysql goes away
This Zabbix problem is lasting at least from 2004. It was reported but it is still not solved. Actually, this issue is fairly hard to any Zabbix user, especially when Zabbix is used for high availability monitoring. Correctly working monitoring software should buffer all the data and when SQL server becomes available should put all the temporary stored values into data storage. So, this issue still prevents use of Zabbix in our company. Sorry, Alexey, we regularly test Zabbix, but still use Nagios - it is simply reliability issue.
002934:20061027:161137 Query::select hostid from hosts where host='localhost'
002934:20061027:161137 Query failed:MySQL server has gone away [2006]
002928:20061027:161137 One server process died. Shutting down...
002928:20061027:161137 ZABBIX server is down.Comment
-
What problem?! Unreliability of your database (power supply, disk storage, whatever) is not ZABBIX problem!Originally posted by TractorThis Zabbix problem is lasting at least from 2004. It was reported but it is still not solved. Actually, this issue is fairly hard to any Zabbix user, especially when Zabbix is used for high availability monitoring. Correctly working monitoring software should buffer all the data and when SQL server becomes available should put all the temporary stored values into data storage. So, this issue still prevents use of Zabbix in our company. Sorry, Alexey, we regularly test Zabbix, but still use Nagios - it is simply reliability issue.
002934:20061027:161137 Query::select hostid from hosts where host='localhost'
002934:20061027:161137 Query failed:MySQL server has gone away [2006]
002928:20061027:161137 One server process died. Shutting down...
002928:20061027:161137 ZABBIX server is down.
Well, it is very simple. ZABBIX does rely on database as most (all?) of mission critical software around. Can SAP, HP OV, banking systems, your company's WEB site, you name it, work without database by "buffering data" somewhere? I doubt so.
Can Nagios or whatever software work if I come and unmount Nagios partition where it keeps all the data? Will it buffer data in memory? I doubt so.
Make your database reliable and use ZABBIX...
Comment
-
Alexei, please do not consider this as attack to Zabbix - personally I like your software and I check status of Zabbix development regularly and we do Zabbix installs twice a year hoping that it will pass our requirements... I want to migrate our monitoring to your software. But I can't do this because of reliability.
Zabbix at least it SHOULD reconnect to mysql instead of shutting off. And if you want good monitoring, it SHOULD buffer received data. Comparison with Nagios may be incorrect because Nagios is not intended to high availability monitoring and statistics at all. For good example, see commercial monitoring products like Nimbus, which can work and collect all the data without database for weeks or even months. Buffering and reconnection issue is also much wider: when you have distributed monitoring system, data should be somewhere stored for periods of time when network is unavailable or server is maintained, etc., so buffering system is needed anyway. And also - monitoring software SHOULD BE MORE RELIABLE than any monitored device or service and because of this reason it SHOULD compensate data storage issues, and SHOULD restore monitoring automatically, as soon as possible, so this is not question about SQL server reliability, but question about MONITORING SOFTWARE capability to do high availability monitoring.Comment
-
I apologise if I was too straight in my previous post. I didn't realise that you had completely different reasoning to your post far from people moaning "my database is down, why ZABBIX stopped?".
Anyway I'm very interested in hearing any ideas or real live experience and requirements which would make the product better. I'm open to serious dialogue either here or privately. High availability and distributed monitoring is something we are trying to achieve in next stable release, ZABBIX 1.4.
I also fully agree that ZABBIX must not silently die in case if database is down. I believe that the best thing ZABBIX could do in this case is:
- notify administrators about lack of database connectivity
- try to reconnect to the database
Note that functionality of ZABBIX depends on availability of a back-end database very much, so it is nearly impossible to make ZABBIX work in no-database mode. Neither frontend, nor triggers, actions, polling, whatever can work without database.
As for buffering, it is already implemented in SVN code on per-node level. It means that nodes can work independently of communications and when comms are back the information is transferred automatically without any manual work.
Please keep you interest in ZABBIX and feel free to talk to me if needed
Comment
-
"I believe that the best thing ZABBIX could do in this case is:
- notify administrators about lack of database connectivity
- try to reconnect to the database"
indeed. i suppose this would cover most cases and complaints about this problem would wanish
"As for buffering, it is already implemented in SVN code on per-node level. It means that nodes can work independently of communications and when comms are back the information is transferred automatically without any manual work."
that covers even more of the problem.
now, having server buffer some data when db is unavailable would also cover snmp trap information received during that period.
i suppose even if first part (server reconnecting to the db), data sent by clients during that time would be lost, unless rejected and buffered at the client. this has the potential of overloading clients, but i'd guess most users would like to keep client load as minimal as possible, choosing overloading monitoring server instead of clients.Comment
-
Hi, Alexei, it is good that we found common understanding :-)
I had used many of different monitoring systems in large environments and I will be glad to help you to develop something what will be able to compete with commercial products :-)
Actually, there can be partial workaraund to current situation with server shutdown which we have employed on our testing system - there can be simply any daemon control system which can restart zabbix_server process automagically, like daemontools working on most unices, or SMF on Solaris, etc.. This can be a better when comparing to service starting scripts used by cron because of delay will be seconds instead of minutes. Anyway, this is not solution, because data and agent control can be lost, especially in big deployments.
Of course I understand that buffering will require some rewrite of server and agents. I see that it can be solved by doing two things:
1. Pre-SQL buffer to store inserts - it can be some additional process which can be used instead of direct connection to mysql or simply it can be text file which can be used to store data. This can be implemented without deep changes, like some hack. But the best way would be some transaction system, when agent sends data to server and server not only puts that data into DB but also informs agent that data is stored and can be deleted from agent buffer now.
2. Configuration cache - it will be needed for agents and for server in case of SQL server becomes inaccessible. Maybe it can be done as some hack to server and agents?
And one other thing: you've mentioned something called SVN and per-node buffering but I was unable to find anything about that. Can you point me to right place to read?Comment
-
SVN is our internal Subversion server we use for ZABBIX development. Currently it has no public access. Distributed monitoring is part of the next beta release (1.3). Documentation will appear on our WEB site prior to official release of the beta (next week?).Originally posted by TractorAnd one other thing: you've mentioned something called SVN and per-node buffering but I was unable to find anything about that. Can you point me to right place to read?Comment
-
same here
Hi,
i have the same Issue. Zabbix is not on the same server as the mysql database, so sometimes when there is a connection issue, just for a second, zabbix dies. I think zabbix should not die, but wait 30 seconds and try to connect again. in the meantime i made this
which i run from a crontab, if the zabbix server isnt there, it will restart it.Code:#!/usr/bin/perl -w use strict; my $command = "/bin/ps -ax | /usr/bin/grep zabbix_server | /usr/bin/grep -v grep"; my $mail = "/bin/echo ALERT | /usr/bin/mail -s \"Zabbix isnt running\" overrider\@domain.com"; my $restart_zabbix = "/usr/local/etc/rc.d/zabbix_server start"; if ( (system $command) ne '0' ) { print "Zabbix aint running so im gonna restart it\n"; system $mail; system $restart_zabbix; } else { print "Zabbix is running\n"; }Comment
-
The issue (unavailability of DB, ZABBIX stops) will be addressed in ZABBIX 1.4. I won't release it in 1.1.x.Comment
-
I had the same problem : zabbix_server going down every day at 6:25 AM. 6:25 is when my cron runs the logrotate process. It seems that mysql restarts or don't accept connections during logs rotation. To correct the problem i have created a new file : /etc/logrotate.d/zabbix-server :
/var/log/zabbix/zabbix_server.log {
daily
compress
ifempty
postrotate
if [ ! -f /var/run/zabbix/zabbix_server.pid ]; then
sleep 30
/etc/init.d/zabbix_server start
fi
endscript
}
(Please adapt paths and file names to your installation)Comment
Comment