Ad Widget

Collapse

Zabbix Server Randomly Stops

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • richerm
    Junior Member
    • Jan 2007
    • 3

    #1

    Zabbix Server Randomly Stops

    I have an issue where the zabbix_server process simply stops on it's own. I have to manually restart it when it dies. It runs for a bit [sometimes a few minutes, sometimes a few hours] and then it dies again. Below is a exerpt of the log. I did notice that there seems to be a loop although the SQL may be different I haven't look into it that much but my entire log [all 9000 lines of it] is over only about 3 seconds [013504-013506].

    Any suggestions? Thanks.

    <!--- START OF LOOP [SO IT SEEMS - MANY ENTRIES IN LOG OVER AND OVER] --->
    001536:20070123:013505 In add_service_alarm()
    001536:20070123:013505 In latest_service_alarm()
    001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=3]
    001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=3
    001536:20070123:013505 SQL [select value from service_alarms where serviceid=3 and clock=1169534098]
    001536:20070123:013505 Executing query:select value from service_alarms where serviceid=3 and clock=1169534098
    001536:20070123:013505 Executing query:update services set status=2 where serviceid=3
    001536:20070123:013505 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=8 and$
    001536:20070123:013505 In add_service_alarm()
    001536:20070123:013505 In latest_service_alarm()
    001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=8]
    001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=8
    001536:20070123:013505 SQL [select value from service_alarms where serviceid=8 and clock=1169534098]
    001536:20070123:013505 Executing query:select value from service_alarms where serviceid=8 and clock=1169534098
    001536:20070123:013505 Executing query:update services set status=2 where serviceid=8
    001536:20070123:013505 Executing query:select serviceupid from services_links where servicedownid=6
    001536:20070123:013505 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
    001536:20070123:013505 Executing query:select serviceupid from services_links where servicedownid=3
    001536:20070123:013505 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
    001536:20070123:013505 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=3 and$
    <!--- END OF LOOP [SO IT SEEMS - MANY ENTRIES IN LOG OVER AND OVER] --->
    001536:20070123:013505 In add_service_alarm()
    001536:20070123:013505 In latest_service_alarm()
    001536:20070123:013505 SQL [select max(clock) from service_alarms where serviceid=3]
    001536:20070123:013505 Executing query:select max(clock) from service_alarms where serviceid=3
    001536:20070123:013505 SQL [select value from service_alarms where serviceid=3 and clock=1169534098]
    001536:20070123:013506 Executing query:select value from service_alarms where serviceid=8 and clock=1169534098
    001536:20070123:013506 Executing query:update services set status=2 where serviceid=8
    001536:20070123:013506 Executing query:select serviceupid from services_links where servicedownid=6
    001536:20070123:013506 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
    001536:20070123:013506 Executing query:select serviceupid from services_links where servicedownid=3
    001536:20070123:013506 Executing query:select l.serviceupid,s.algorithm from services_links l,services s where s.serviceid=l.$
    001536:20070123:013506 Executing query:select count(*),max(status) from services s,services_links l where l.serviceupid=3 and$
    001536:20070123:013506 In add_service_alarm()
    001533:20070123:013506 One server process died. Shutting down...
    001533:20070123:013506 0. Killing PID=[1534]
    001533:20070123:013506 1. Killing PID=[1535]
    001533:20070123:013506 2. Killing PID=[1536]
    001533:20070123:013506 3. Killing PID=[1537]
    001533:20070123:013506 4. Killing PID=[1538]
    001540:20070123:013506 Server [6]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001533:20070123:013506 5. Killing PID=[1540]
    001533:20070123:013506 6. Killing PID=[1541]
    001533:20070123:013506 7. Killing PID=[1542]
    001533:20070123:013506 8. Killing PID=[1543]
    001544:20070123:013506 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001533:20070123:013506 9. Killing PID=[1544]
    001533:20070123:013506 10. Killing PID=[1545]
    001533:20070123:013506 11. Killing PID=[1546]
    001533:20070123:013506 12. Killing PID=[1549]
    001533:20070123:013506 13. Killing PID=[1550]
    001533:20070123:013506 14. Killing PID=[1551]
    001534:20070123:013506 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001535:20070123:013506 Server [2]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001537:20070123:013506 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001538:20070123:013506 Server [5]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001541:20070123:013506 Server [7]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001542:20070123:013506 Server [8]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001543:20070123:013506 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001545:20070123:013506 Server [11]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001546:20070123:013506 Server [12]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001549:20070123:013506 Server [13]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001550:20070123:013506 Server [14]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001551:20070123:013506 Server [15]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    001533:20070123:013506 ZABBIX server is down.
  • qix
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2006
    • 423

    #2
    I see the long lines from your logfiles have been cut off ($ signs).
    Is there any message like "You have a error in your SQL syntax" or similar?
    With kind regards,

    Raymond

    Comment

    • richerm
      Junior Member
      • Jan 2007
      • 3

      #3
      Originally posted by qix
      I see the long lines from your logfiles have been cut off ($ signs).
      Is there any message like "You have a error in your SQL syntax" or similar?
      Sorry about the abbreviated log as it was a copy/paste from a Putty connection. But to answer your question there is no mention of any SQL errors and just to confirm I ran a few various greps and they all returned nothing [ cat * | grep -E "syntax" - returns nothing ]

      You can download the log if you prefer from:

      Last edited by richerm; 23-01-2007, 18:12.

      Comment

      • qix
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2006
        • 423

        #4
        I have taken a look at your log, but I'm afraid I don't see an apparent reason why a proces would die.

        Have you taken a look into your database server logs if there is anything weird happening round the time your Zabbix proces dies?
        With kind regards,

        Raymond

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          I believe you have a loop in configuration of IT Services (one service points on another and vs). ZABBIX doesn't check for such situations currently, so it just goes in an endless loop and obviously crashes.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • richerm
            Junior Member
            • Jan 2007
            • 3

            #6
            Originally posted by Alexei
            I believe you have a loop in configuration of IT Services (one service points on another and vs). ZABBIX doesn't check for such situations currently, so it just goes in an endless loop and obviously crashes.
            Alexei, thank-you for putting me on the right track. I did remember making some changes to the IT services screen so I went in and deleted all the IT services (nothing critical anyways ). When I attempted to restart it, it still failed, so I logged into the MySQL database and checked and lo-and-behold there was still two services listed there and yet they didn't show up on my interface. Removal of those services allowed it to restart and so far so good.

            Is there any other clean up I should do since I just did a DELETE sql statement on the services table?

            Thanks again,

            Matt.

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              Originally posted by richerm
              Is there any other clean up I should do since I just did a DELETE sql statement on the services table?
              No, everything should be fine.
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              Working...