I have an issue where the zabbix_server process simply stops on it's own. I have to manually restart it when it dies. It runs for a bit [sometimes a few minutes, sometimes a few hours] and then it dies again. Below is a exerpt of the log. I did notice that there seems to be a loop although the SQL may be different I haven't look into it that much but my entire log [all 9000 lines of it] is over only about 3 seconds [013504-013506].
Any suggestions? Thanks.
<!--- START OF LOOP [SO IT SEEMS - MANY ENTRIES IN LOG OVER AND OVER] --->
001536:20070123:013505 In add_service_alarm()
001536:20070123:013505 In latest_service_alarm()
001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=3]
001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=3
001536:20070123:013505 SQL [select value from service_alarms where serviceid=3 and clock=1169534098]
001536:20070123:013505 Executing query:select value from service_alarms where serviceid=3 and clock=1169534098
001536:20070123:013505 Executing query:update services set status=2 where serviceid=3
001536:20070123:013505 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=8 and$
001536:20070123:013505 In add_service_alarm()
001536:20070123:013505 In latest_service_alarm()
001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=8]
001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=8
001536:20070123:013505 SQL [select value from service_alarms where serviceid=8 and clock=1169534098]
001536:20070123:013505 Executing query:select value from service_alarms where serviceid=8 and clock=1169534098
001536:20070123:013505 Executing query:update services set status=2 where serviceid=8
001536:20070123:013505 Executing query:select serviceupid from services_links where servicedownid=6
001536:20070123:013505 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013505 Executing query:select serviceupid from services_links where servicedownid=3
001536:20070123:013505 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013505 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=3 and$
<!--- END OF LOOP [SO IT SEEMS - MANY ENTRIES IN LOG OVER AND OVER] --->
001536:20070123:013505 In add_service_alarm()
001536:20070123:013505 In latest_service_alarm()
001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=3]
001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=3
001536:20070123:013505 SQL [select value from service_alarms where serviceid=3 and clock=1169534098]
001536:20070123:013506 Executing query:select value from service_alarms where serviceid=8 and clock=1169534098
001536:20070123:013506 Executing query:update services set status=2 where serviceid=8
001536:20070123:013506 Executing query:select serviceupid from services_links where servicedownid=6
001536:20070123:013506 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013506 Executing query:select serviceupid from services_links where servicedownid=3
001536:20070123:013506 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013506 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=3 and$
001536:20070123:013506 In add_service_alarm()
001533:20070123:013506 One server process died. Shutting down...
001533:20070123:013506 0. Killing PID=[1534]
001533:20070123:013506 1. Killing PID=[1535]
001533:20070123:013506 2. Killing PID=[1536]
001533:20070123:013506 3. Killing PID=[1537]
001533:20070123:013506 4. Killing PID=[1538]
001540:20070123:013506 Server [6]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001533:20070123:013506 5. Killing PID=[1540]
001533:20070123:013506 6. Killing PID=[1541]
001533:20070123:013506 7. Killing PID=[1542]
001533:20070123:013506 8. Killing PID=[1543]
001544:20070123:013506 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001533:20070123:013506 9. Killing PID=[1544]
001533:20070123:013506 10. Killing PID=[1545]
001533:20070123:013506 11. Killing PID=[1546]
001533:20070123:013506 12. Killing PID=[1549]
001533:20070123:013506 13. Killing PID=[1550]
001533:20070123:013506 14. Killing PID=[1551]
001534:20070123:013506 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001535:20070123:013506 Server [2]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001537:20070123:013506 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001538:20070123:013506 Server [5]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001541:20070123:013506 Server [7]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001542:20070123:013506 Server [8]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001543:20070123:013506 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001545:20070123:013506 Server [11]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001546:20070123:013506 Server [12]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001549:20070123:013506 Server [13]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001550:20070123:013506 Server [14]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001551:20070123:013506 Server [15]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001533:20070123:013506 ZABBIX server is down.
Any suggestions? Thanks.
<!--- START OF LOOP [SO IT SEEMS - MANY ENTRIES IN LOG OVER AND OVER] --->
001536:20070123:013505 In add_service_alarm()
001536:20070123:013505 In latest_service_alarm()
001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=3]
001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=3
001536:20070123:013505 SQL [select value from service_alarms where serviceid=3 and clock=1169534098]
001536:20070123:013505 Executing query:select value from service_alarms where serviceid=3 and clock=1169534098
001536:20070123:013505 Executing query:update services set status=2 where serviceid=3
001536:20070123:013505 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=8 and$
001536:20070123:013505 In add_service_alarm()
001536:20070123:013505 In latest_service_alarm()
001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=8]
001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=8
001536:20070123:013505 SQL [select value from service_alarms where serviceid=8 and clock=1169534098]
001536:20070123:013505 Executing query:select value from service_alarms where serviceid=8 and clock=1169534098
001536:20070123:013505 Executing query:update services set status=2 where serviceid=8
001536:20070123:013505 Executing query:select serviceupid from services_links where servicedownid=6
001536:20070123:013505 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013505 Executing query:select serviceupid from services_links where servicedownid=3
001536:20070123:013505 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013505 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=3 and$
<!--- END OF LOOP [SO IT SEEMS - MANY ENTRIES IN LOG OVER AND OVER] --->
001536:20070123:013505 In add_service_alarm()
001536:20070123:013505 In latest_service_alarm()
001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=3]
001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=3
001536:20070123:013505 SQL [select value from service_alarms where serviceid=3 and clock=1169534098]
001536:20070123:013506 Executing query:select value from service_alarms where serviceid=8 and clock=1169534098
001536:20070123:013506 Executing query:update services set status=2 where serviceid=8
001536:20070123:013506 Executing query:select serviceupid from services_links where servicedownid=6
001536:20070123:013506 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013506 Executing query:select serviceupid from services_links where servicedownid=3
001536:20070123:013506 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
001536:20070123:013506 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=3 and$
001536:20070123:013506 In add_service_alarm()
001533:20070123:013506 One server process died. Shutting down...
001533:20070123:013506 0. Killing PID=[1534]
001533:20070123:013506 1. Killing PID=[1535]
001533:20070123:013506 2. Killing PID=[1536]
001533:20070123:013506 3. Killing PID=[1537]
001533:20070123:013506 4. Killing PID=[1538]
001540:20070123:013506 Server [6]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001533:20070123:013506 5. Killing PID=[1540]
001533:20070123:013506 6. Killing PID=[1541]
001533:20070123:013506 7. Killing PID=[1542]
001533:20070123:013506 8. Killing PID=[1543]
001544:20070123:013506 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001533:20070123:013506 9. Killing PID=[1544]
001533:20070123:013506 10. Killing PID=[1545]
001533:20070123:013506 11. Killing PID=[1546]
001533:20070123:013506 12. Killing PID=[1549]
001533:20070123:013506 13. Killing PID=[1550]
001533:20070123:013506 14. Killing PID=[1551]
001534:20070123:013506 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001535:20070123:013506 Server [2]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001537:20070123:013506 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001538:20070123:013506 Server [5]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001541:20070123:013506 Server [7]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001542:20070123:013506 Server [8]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001543:20070123:013506 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001545:20070123:013506 Server [11]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001546:20070123:013506 Server [12]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001549:20070123:013506 Server [13]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001550:20070123:013506 Server [14]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001551:20070123:013506 Server [15]. Got QUIT or INT or TERM or PIPE signal. Exiting...
001533:20070123:013506 ZABBIX server is down.
). When I attempted to restart it, it still failed, so I logged into the MySQL database and checked and lo-and-behold there was still two services listed there and yet they didn't show up on my interface. Removal of those services allowed it to restart and so far so good.
Comment