Ad Widget

Collapse

One server process died. Shutting down...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • peaceofcrap2001
    Junior Member
    • May 2006
    • 10

    #1

    One server process died. Shutting down...

    I am having problem starting 'zabbix_server.' Please See below for part of my log file (log level = 5). I've spent quit a bit of time to resovle this issue, but no luck so far.

    I am testing out Zabbix for the first time (zabbix-1.1beta11/MySQL 5.0.21/RedHat 7.3). Even though I've had few problems, I like what I see so far! (I will post the problems I encountered and their solution after I've a running environment.)

    PHP Code:
    005680:20060529:175725 server #1 started [Alerter]
    005681:20060529:175725 server #2 started [Timer]
    005682:20060529:175725 server #3 started [ICMP pinger]
    005688:20060529:175725 In child_main()
    005688:20060529:175725 server #6 started [Trapper]
    005688:20060529:175725 Before DBconnect()
    005689:20060529:175725 In child_main()
    005689:20060529:175725 server #7 started [Trapper]
    005689:20060529:175725 Before DBconnect()
    005678:20060529:175725 server #0 started [Housekeeper]
    005678:20060529:175725 0. PID=[5680]
    005678:20060529:175725 1. PID=[5681]
    005678:20060529:175725 2. PID=[5682]
    005678:20060529:175725 3. PID=[5683]
    005678:20060529:175725 4. PID=[5687]
    005678:20060529:175725 5. PID=[5688]
    005678:20060529:175725 6. PID=[5689]
    005678:20060529:175725 7. PID=[5690]
    005678:20060529:175725 8. PID=[5693]
    005678:20060529:175725 9. PID=[5694]
    005678:20060529:175725 ZABBIX server is up.
    005690:20060529:175725 In child_main()
    005690:20060529:175725 server #8 started [Trapper]
    005690:20060529:175725 Before DBconnect()
    005693:20060529:175725 In child_main()
    005693:20060529:175725 server #9 started [Trapper]
    005693:20060529:175725 Before DBconnect()
    005694:20060529:175725 In child_main()
    005694:20060529:175725 server #10 started [Trapper]
    005694:20060529:175725 Before DBconnect()
    005678:20060529:175726 One server process diedShutting down...
    005678:20060529:175726 0. Killing PID=[5680]
    005678:20060529:175726 1. Killing PID=[5681]
    005678:20060529:175726 2. Killing PID=[5682]
    005678:20060529:175726 3. Killing PID=[5683]
    005678:20060529:175726 4. Killing PID=[5687]
    005678:20060529:175726 5. Killing PID=[5688]
    005678:20060529:175726 6. Killing PID=[5689]
    005678:20060529:175726 7. Killing PID=[5690]
    005678:20060529:175726 8. Killing PID=[5693]
    005678:20060529:175726 9. Killing PID=[5694]
    005678:20060529:175726 ZABBIX server is down.
    005680:20060529:175726 Server [1]. Got QUIT or INT or TERM or PIPE signalExiting...
    005681:20060529:175726 Server [2]. Got QUIT or INT or TERM or PIPE signalExiting...
    005682:20060529:175726 Server [3]. Got QUIT or INT or TERM or PIPE signalExiting...
    005688:20060529:175726 Server [6]. Got QUIT or INT or TERM or PIPE signalExiting...
    005689:20060529:175726 Server [7]. Got QUIT or INT or TERM or PIPE signalExiting...
    005690:20060529:175726 Server [8]. Got QUIT or INT or TERM or PIPE signalExiting...
    005693:20060529:175726 Server [9]. Got QUIT or INT or TERM or PIPE signalExiting...
    005694:20060529:175726 Server [10]. Got QUIT or INT or TERM or PIPE signalExiting...
    005687:20060529:175726 Server [5]. Got QUIT or INT or TERM or PIPE signalExiting... 
    I would like to present Zabbix in a testing environment to my team by Wed. Does anybody might know the solution for my problem?

    Ambex
    Last edited by peaceofcrap2001; 30-05-2006, 21:27. Reason: More Info
  • erisan500
    Senior Member
    Zabbix Certified Specialist
    • Aug 2005
    • 285

    #2
    Adjust your debug level

    I dunno if logging is affected by your setting (debuglevel=5) but looking at the manual the maximum debuglevel=4.
    • 0 - none
    • 1 - critical
    • 2 - error
    • 3 - warnings
    • 4 - debug
    So put it (back) to 4 and maybe you'll have more info in the log.

    (Configuration files: http://www.zabbix.com/manual/v1.1/config_files.php)
    Greetings
    EriSan
    Zabbix Certified Specialist

    Comment

    • peaceofcrap2001
      Junior Member
      • May 2006
      • 10

      #3
      sorry...the above output was resulted from debug level 4.

      Problem still not resolved. Any ideas?
      Last edited by peaceofcrap2001; 30-05-2006, 16:15.

      Comment

      • peaceofcrap2001
        Junior Member
        • May 2006
        • 10

        #4
        I am receiving the same error when trying to start the zabbix_server on zabbix 1.1beta10 too.

        Does anybody know what version I need to try?

        I have tried

        - zabbix 1.1beta4
        - zabbix 1.1beta5
        - zabbix 1.1beta8
        - zabbix 1.1beta9

        Still nothing. I am about to give up

        Please help me....I am spending too much time on this that I should.

        Ambex
        Last edited by peaceofcrap2001; 31-05-2006, 02:35. Reason: more info....

        Comment

        • peaceofcrap2001
          Junior Member
          • May 2006
          • 10

          #5
          New installation from scratch.

          I removed all files and started the installation fresh. The zabbix_server is still shutting down. Here is my whole log file:

          Code:
          010834:20060530:210121 Starting zabbix_server. ZABBIX 1.1beta11.
          010834:20060530:210121 Executing query:select refresh_unsupported from config
          010834:20060530:210121 In DBupdate_triggers_after_restart()
          010834:20060530:210121 SQL [select distinct t.triggerid,t.value from hosts h,items i,triggers t,functions f where f.triggerid=t.triggerid and f.itemid=i.itemid and h.hostid=i.hostid and i.nextcheck+i.delay<1149040881 and i.key_<>'status' and h.status not in (4,3)]
          010834:20060530:210121 Executing query:select distinct t.triggerid,t.value from hosts h,items i,triggers t,functions f where f.triggerid=t.triggerid and f.itemid=i.itemid and h.hostid=i.hostid and i.nextcheck+i.delay<1149040881 and i.key_<>'status' and h.status not in (4,3)
          010834:20060530:210121 End of DBupdate_triggers_after_restart()
          010836:20060530:210121 server #1 started [Alerter]
          010836:20060530:210121 Executing query:select a.alertid,a.mediatypeid,a.sendto,a.subject,a.message,a.status,a.retries,mt.mediatypeid,mt.type,mt.description,mt.smtp_server,mt.smtp_helo,mt.smtp_email,mt.exec_path,a.delay from alerts a,media_type mt where a.status=0 and a.retries<3 and (a.repeats<a.maxrepeats or a.maxrepeats=0) and a.nextcheck<=1149040881 and a.mediatypeid=mt.mediatypeid order by a.clock
          010837:20060530:210121 server #2 started [Timer]
          010837:20060530:210121 Executing query:select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid from hosts h, items i, functions f where h.hostid=i.hostid and h.status=0 and i.status=0 and f.function in ('nodata','date','dayofweek','time','now') and i.itemid=f.itemid
          010840:20060530:210121 server #3 started [ICMP pinger]
          010840:20060530:210121 In create_host_file()
          010840:20060530:210121 Executing query:select distinct h.ip from hosts h,items i where i.hostid=h.hostid and (h.status=0 or (h.status=0 and h.available=2 and h.disable_until<=1149040881)) and (i.key_='icmpping' or i.key_='icmppingsec') and i.type=3 and i.status=0 and h.useip=1
          010840:20060530:210121 Executing query:select distinct h.host from hosts h,items i where i.hostid=h.hostid and (h.status=0 or (h.status=0 and h.available=2 and h.disable_until<=1149040881)) and (i.key_='icmpping' or i.key_='icmppingsec') and i.type=3 and i.status=0 and h.useip=0
          010840:20060530:210121 In do_ping()
          010846:20060530:210121 In child_main()
          010846:20060530:210121 server #6 started [Trapper]
          010846:20060530:210121 Before DBconnect()
          010846:20060530:210121 After DBconnect()
          010846:20060530:210121 Before accept()
          010847:20060530:210121 In child_main()
          010847:20060530:210121 server #7 started [Trapper]
          010847:20060530:210121 Before DBconnect()
          010847:20060530:210121 After DBconnect()
          010847:20060530:210121 Before accept()
          010848:20060530:210121 In child_main()
          010848:20060530:210121 server #8 started [Trapper]
          010848:20060530:210121 Before DBconnect()
          010848:20060530:210121 After DBconnect()
          010848:20060530:210121 Before accept()
          010834:20060530:210121 server #0 started [Housekeeper]
          010834:20060530:210121 0. PID=[10836]
          010834:20060530:210121 1. PID=[10837]
          010834:20060530:210121 2. PID=[10840]
          010834:20060530:210121 3. PID=[10841]
          010834:20060530:210121 4. PID=[10842]
          010834:20060530:210121 5. PID=[10846]
          010834:20060530:210121 6. PID=[10847]
          010834:20060530:210121 7. PID=[10848]
          010834:20060530:210121 8. PID=[10853]
          010834:20060530:210121 9. PID=[10854]
          010834:20060530:210121 [COLOR=blue]ZABBIX server is up.[/COLOR]
          010853:20060530:210121 In child_main()
          010853:20060530:210121 server #9 started [Trapper]
          010853:20060530:210121 Before DBconnect()
          010853:20060530:210121 After DBconnect()
          010853:20060530:210121 Before accept()
          010834:20060530:210121 In housekeeping_process_log()
          010834:20060530:210121 Executing query:select housekeeperid, tablename, field, value from housekeeper order by tablename
          010834:20060530:210121 In housekeeping_alarms(1149040881)
          010834:20060530:210121 Executing query:select alarm_history from config
          010834:20060530:210121 Executing query:select alarmid from alarms where clock<1117504881
          010834:20060530:210121 In housekeeping_alerts(1149040881)
          010834:20060530:210121 Executing query:select alert_history from config
          010834:20060530:210121 Executing query:delete from alerts where clock<1117504881
          010854:20060530:210121 In child_main()
          010854:20060530:210121 server #10 started [Trapper]
          010854:20060530:210121 Before DBconnect()
          010834:20060530:210121 Deleted [0] records from table [alerts]
          010834:20060530:210121 In housekeeping_sessions(1149040881)
          010834:20060530:210121 Executing query:delete from sessions where lastaccess<1148954481
          010834:20060530:210121 Deleted [0] records from table [sessions]
          010834:20060530:210121 Sleeping for 1 hours
          010854:20060530:210121 After DBconnect()
          010854:20060530:210121 Before accept()
          010834:20060530:210121 [COLOR=Red]One server process died. Shutting down...[/COLOR]
          010834:20060530:210121 0. Killing PID=[10836]
          010834:20060530:210121 1. Killing PID=[10837]
          010840:20060530:210121 Server [3]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010836:20060530:210121 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010837:20060530:210121 Server [2]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010834:20060530:210121 2. Killing PID=[10840]
          010834:20060530:210121 3. Killing PID=[10841]
          010834:20060530:210121 4. Killing PID=[10842]
          010834:20060530:210121 5. Killing PID=[10846]
          010834:20060530:210121 6. Killing PID=[10847]
          010834:20060530:210121 7. Killing PID=[10848]
          010834:20060530:210121 8. Killing PID=[10853]
          010834:20060530:210121 9. Killing PID=[10854]
          010834:20060530:210121 [COLOR=Red]ZABBIX server is down.[/COLOR]
          010846:20060530:210121 Server [6]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010847:20060530:210121 Server [7]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010848:20060530:210121 Server [8]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010853:20060530:210121 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010854:20060530:210121 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          010841:20060530:210121 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
          As far as I can tell, no one has expereinced this issue in this user group. Sould I submit the log file to Alexei? Please ... please ...please help.

          Oh yea....create/mysql/schema.sql contains an error line 384. it is missing ( at the begining of 'create table ... ' statement

          ERROR 1064 (42000) at line 384: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'mediaid int(4) NOT NULL auto_increment,
          userid int(4) DEFAULT '0' NOT NULL,
          ' at line 2

          Comment

          • schneck
            Member
            • May 2006
            • 62

            #6
            check core dump?

            Usually on any Unix/Linux/BSD system, programs that die leave core dumps in their current working directory (if they can). You can use GDB (or any other debugger of your choice) to check why and where the process died.

            On my system, zabbix does a chdir(/) on startup and is running as an unprivileged user, so it can't write a core dump (no write permission). Temporarily changing the premissions of / to allow write access and then running zabbix created a core dump file, which allowed me to figure out what went wrong (in this case, my fault :-)

            \B.

            PS: don't forget to repair the permissions on / after you get what you need!

            Comment

            • peaceofcrap2001
              Junior Member
              • May 2006
              • 10

              #7
              Thank you...

              schneck

              Thank you for your response. I will do some reading on core dumps as you have suggested. Still working on solution ...

              Ambex

              Comment

              • peaceofcrap2001
                Junior Member
                • May 2006
                • 10

                #8
                I used gdb to debug zabbix_server. The program is excuting normally. See the debugging result below.

                Code:
                [root@tst01 bin]# gdb zabbix_server
                GNU gdb Red Hat Linux (5.2-2)
                Copyright 2002 Free Software Foundation, Inc.
                GDB is free software, covered by the GNU General Public License, and you are
                welcome to change it and/or distribute copies of it under certain conditions.
                Type "show copying" to see the conditions.
                There is absolutely no warranty for GDB.  Type "show warranty" for details.
                This GDB was configured as "i386-redhat-linux"...
                (gdb) run
                Starting program: /u01/app/zabbix/bin/zabbix_server
                
                Program exited normally.
                (gdb)
                I am on the verge of giving up, but I have spent so much time on this. Even though you may not have a solution, do you have any suggestions what I should do next.

                My team was extreamly excited to see zabbix running.

                Comment

                • schneck
                  Member
                  • May 2006
                  • 62

                  #9
                  Originally posted by peaceofcrap2001
                  I used gdb to debug zabbix_server. The program is excuting normally. See the debugging result below.

                  Code:
                  [root@tst01 bin]# gdb zabbix_server
                  ...
                  Of course ... this is the exit of the master process you see, not the one which dies. Gdb (usually) doesn't care much about forked children.

                  To figure out what happens, I'd recommend looking a the dump:

                  Code:
                  $ chmod 777 /
                  [... run zabbix until it dies ...]
                  $ chmod 755 /
                  $ gdb /usr/local/bin/zabbix_server /zabbix_server.core
                  gdb> where
                  (or whatever pathnames fit your system, the core should be in /)

                  \B.

                  Comment

                  • jhigley
                    Junior Member
                    • May 2006
                    • 2

                    #10
                    Hello. I just installed beta 12, am a new user and am getting the same thing:

                    PHP Code:
                    004780:20060531:160349 One server process diedShutting down...
                    004780:20060531:160349 0. Killing PID=[4782]
                    004780:20060531:160349 1. Killing PID=[4784]
                    004780:20060531:160349 2. Killing PID=[4786]
                    004780:20060531:160349 3. Killing PID=[4789]
                    004780:20060531:160349 4. Killing PID=[4790]
                    004780:20060531:160349 5. Killing PID=[4791]
                    004780:20060531:160349 6. Killing PID=[4798]
                    004780:20060531:160349 7. Killing PID=[4800]
                    004780:20060531:160349 8. Killing PID=[4802]
                    004780:20060531:160349 9. Killing PID=[4804]
                    004780:20060531:160349 ZABBIX server is down
                    Running SuSE 9.2/ mysql5/ zabbix 1beta12 running as user zabbix

                    I also tried to install stable 1.0 and got a problem on the make install:

                    /home/zabbix/zabbix-1.0# make install
                    make: *** No rule to make target `install-recursive', needed by `install'. Stop.

                    Many thanks.

                    Comment

                    • peaceofcrap2001
                      Junior Member
                      • May 2006
                      • 10

                      #11
                      may be a bug?

                      I installed zabbix1.1beta12 only to receive the same error again. I am starting to think that this is a bug rather than a configuration error.

                      oh yea, I am getting the following error on zabbix-1.0 too.

                      Code:
                      [root@tst01 bin]# make install
                      make: *** No rule to make target `install-recursive', needed by `install'. Stop.
                      please let me know if you find a solution.

                      Ambex
                      ----------------------------------------
                      Redhat 7.3/mysql 5.0.21/ zabbix1.1beta12

                      Comment

                      • KarmaPolice
                        Member
                        • Oct 2005
                        • 95

                        #12
                        I've had this problem with various versions and settings in MySQL... i don't know why one causes the other, nor have i been able to pin down exactly what settings were causing it... but i am currently running on MySQL 5.0.18 and things got a lot better when i upgraded to that version from 5.0.13 ... i don't know what the problem might be... but i've had it in the past as well and through changing as much as i could got it to go away.

                        Comment

                        • proximo
                          Junior Member
                          • Jun 2006
                          • 4

                          #13
                          I have had this issue since b10 and just now figured out what the cause is .. not specifically, but a workaround. If I compile with --with-net-snmp then I get the "One server process died ..." without any explination even with a debug level 4 in zabbix_server.conf. If I compile without (no snmp --with statement) then the zabbix_server will start w/o problem. I'm running on RHEL4 with kernel 2.6.9-34.107 smp.

                          Hope this helps some of you ..

                          {P}

                          Comment

                          • jhigley
                            Junior Member
                            • May 2006
                            • 2

                            #14
                            that fixed it for me. thanks for the tip!
                            This now at least gives me something to play with.

                            Comment

                            • peaceofcrap2001
                              Junior Member
                              • May 2006
                              • 10

                              #15
                              Proximo...thanks. Running it without --net-snmp seems to have fixed it. I don't know what functionalities I would lose by not having --net-snmp, but I could work with what I have for right now.

                              If anybody else resolves this issue, please let us know. I like to thank every one of you for trying to help.

                              Oh happy day!

                              Ambex

                              Comment

                              Working...