Ad Widget

Collapse

zabbix server shutting down

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dantheman
    Senior Member
    • May 2006
    • 209

    #1

    zabbix server shutting down

    I first started having problems with the beta12, the server was crashing on me at random times. I was having no problem from beta9 until beta 12. Now that I see the 1.1 has been released I thought maybe that would fix the problem so I upgraded to it, but it shuts down after you start it up. I turned on the highest level of debugging and below is last chunk of the resulting log file.. (if you need more of it, let me know I can try to post more.. it's apparently a little over the size limit of what can be posted on here.) if anyone can help me figure out what is going on, or what else to do to troubleshoot this, I'd appreciate it.

    Thanks,


    014902:20060602:143400 RESULT_STR [Ð]
    014902:20060602:143400 In process_new_value()
    014902:20060602:143400 In add_history(system.cpu.load[,avg15],,0,2)
    014902:20060602:143400 In add_history(18099,DOUBLE:1.805556)
    014902:20060602:143400 In add_history()
    014902:20060602:143400 Executing query:insert into history (clock,itemid,value) values (1149284040,18099,1.805556)
    014902:20060602:143400 In add_trend()
    014902:20060602:143400 SQL [select num,value_min,value_avg,value_max from trends where itemid=18099 and clock=1149282000]
    014902:20060602:143400 Executing query:select num,value_min,value_avg,value_max from trends where itemid=18099 and clock=1149282000
    014902:20060602:143400 Executing query:update trends set num=2, value_min=1.805556, value_avg=2.091128, value_max=2.376700 where itemid=18099 and clock=1149282000
    014902:20060602:143400 End of add_history
    014902:20060602:143400 In update_item()
    014902:20060602:143400 Executing query:update items set nextcheck=1149284700,prevvalue=lastvalue,lastvalue ='1.805556',lastclock=1149284040 where itemid=18099
    014902:20060602:143400 In update_functions(18099)
    014902:20060602:143400 Executing query:select distinct function,parameter,itemid,lastvalue from functions where itemid=18099
    014902:20060602:143400 In update_triggers [18099]
    014902:20060602:143400 Executing query:select distinct t.triggerid,t.expression,t.status,t.dep_level,t.pr iority,t.value,t.description from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=18099
    014902:20060602:143400 End of update_triggers [18099]
    014902:20060602:143400 GOT VALUE TYPE [0x0]
    014902:20060602:143400 get_value_agent: host[Calypso] ip[10.2.1.92] key [system.cpu.load[,avg15]]
    014902:20060602:143400 Sending [system.cpu.load[,avg15]
    ]
    014902:20060602:143400 RESULT_STR [Ð]
    014902:20060602:143400 In process_new_value()
    014902:20060602:143400 In add_history(system.cpu.load[,avg15],,0,2)
    014902:20060602:143400 In add_history(18105,DOUBLE:0.133333)
    014902:20060602:143400 In add_history()
    014902:20060602:143400 Executing query:insert into history (clock,itemid,value) values (1149284040,18105,0.133333)
    014899:20060602:143401 Update IP [10.28.1.1 is alive (1738 ms)]
    014899:20060602:143401 Mseconds [1738.000000]
    014899:20060602:143401 IP [10.28.1.1] alive [1]
    014899:20060602:143401 In process_value([email protected])
    014899:20060602:143401 In process_ip([10.28.1.1])
    014899:20060602:143401 End of process_ip([0])
    014899:20060602:143401 SQL [select i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h .useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.va lue_type,i.trapper_hosts,i.delta,i.units,i.multipl ier,i.formula from items i,hosts h where h.status=0 and h.hostid=i.hostid and h.ip='10.28.1.1' and i.key_='icmpping' and i.status=0 and i.type=3]
    014899:20060602:143401 Executing query:select i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h .useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.va lue_type,i.trapper_hosts,i.delta,i.units,i.multipl ier,i.formula from items i,hosts h where h.status=0 and h.hostid=i.hostid and h.ip='10.28.1.1' and i.key_='icmpping' and i.status=0 and i.type=3
    014903:20060602:143401 After accept()
    014903:20060602:143401 Before read()
    014903:20060602:143401 After read() 2 [104]
    014903:20060602:143401 Got line:<req><host>YXViMTA0</host><key>RXZlbnRsb2dbU3lzdGVtXQ==</key><data>WkJYX05PVFNVUFBPUlRFRAo=</data></req
    014903:20060602:143401 Trapper got [<req><host>YXViMTA0</host><key>RXZlbnRsb2dbU3lzdGVtXQ==</key><data>WkJYX05PVFNVUFBPUlRFRAo=</data></req]
    014903:20060602:143401 XML received [<req><host>YXViMTA0</host><key>RXZlbnRsb2dbU3lzdGVtXQ==</key><data>WkJYX05PVFNVUFBPUlRFRAo=</data></req]
    014903:20060602:143401 In process_data([aub104],[Eventlog[System]],[ZBX_NOTSUPPORTED
    ],[])
    014903:20060602:143401 Executing query:select i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h .useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.ho stid,h.status,i.value_type,h.errors_from,i.snmp_po rt,i.delta,i.prevorgvalue,i.lastclock,i.units,i.mu ltiplier,i.snmpv3_securityname,i.snmpv3_securityle vel,i.snmpv3_authpassphrase,i.snmpv3_privpassphras e,i.formula,h.available,i.status,i.trapper_hosts,i .logtimefmt,i.valuemapid from hosts h, items i where h.status=0 and h.hostid=i.hostid and h.host='aub104' and i.key_='Eventlog[System]' and i.status=0 and i.type in (2,7)
    014903:20060602:143401 In check_security()
    014903:20060602:143401 Processing [ZBX_NOTSUPPORTED
    ]
    014903:20060602:143401 In process_new_value()
    014903:20060602:143401 In add_history(Eventlog[System],,2,4)
    014903:20060602:143401 In add_history(19185,STRING:ZBX_NOTSUPPORTED
    )
    014903:20060602:143401 In add_history_log()
    014893:20060602:143401 One server process died. Shutting down...
    014893:20060602:143401 0. Killing PID=[14895]
    014893:20060602:143401 1. Killing PID=[14896]
    014893:20060602:143401 2. Killing PID=[14899]
    014893:20060602:143401 3. Killing PID=[14901]
    014893:20060602:143401 4. Killing PID=[14902]
    014893:20060602:143401 5. Killing PID=[14903]
    014893:20060602:143401 6. Killing PID=[14905]
    014893:20060602:143401 7. Killing PID=[14907]
    014893:20060602:143401 8. Killing PID=[14909]
    014893:20060602:143401 9. Killing PID=[14911]
    014893:20060602:143401 ZABBIX server is down.
    014895:20060602:143401 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014896:20060602:143401 Server [2]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014905:20060602:143401 Server [7]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014907:20060602:143401 Server [8]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014911:20060602:143401 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014901:20060602:143401 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014899:20060602:143401 Server [3]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014902:20060602:143401 Server [5]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    014909:20060602:143401 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
  • rondeniable
    Junior Member
    • Jun 2006
    • 14

    #2
    I just noticed that im having the same problem.

    Comment

    • dcrandall
      Member
      • Apr 2006
      • 59

      #3
      I am having the exact same problem in beta12. Same pattern of errors in the server log.

      In my case I have found out that the history table seems to be corrupt.

      bash-2.05b# mysql -uroot zabbix -e 'check table history'
      ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query

      I have set the following directive in my.cnf and am atempting to dump the table.
      [mysqld]
      innodb_force_recovery = 4

      database corruption = bad.

      Comment

      • rondeniable
        Junior Member
        • Jun 2006
        • 14

        #4
        That is a little differant for me as my database checks ok.

        [root@gw-app2 ~]# mysqlcheck -uroot -p zabbix
        Enter password:
        zabbix.acknowledges OK
        zabbix.actions OK
        zabbix.alarms OK
        zabbix.alerts OK
        zabbix.applications OK
        zabbix.auditlog OK
        zabbix.autoreg OK
        zabbix.conditions OK
        zabbix.config OK
        zabbix.escalation_log OK
        zabbix.escalation_rules OK
        zabbix.escalations OK
        zabbix.functions OK
        zabbix.graphs OK
        zabbix.graphs_items OK
        zabbix.groups OK
        zabbix.help_items OK
        zabbix.history OK
        zabbix.history_log OK
        zabbix.history_str OK
        zabbix.history_text OK
        zabbix.history_uint OK
        zabbix.hosts OK
        zabbix.hosts_groups OK
        zabbix.hosts_profiles OK
        zabbix.hosts_templates OK
        zabbix.housekeeper OK
        zabbix.images OK
        zabbix.items OK
        zabbix.items_applications OK
        zabbix.mappings OK
        zabbix.media OK
        zabbix.media_type OK
        zabbix.profiles OK
        zabbix.rights OK
        zabbix.screens OK
        zabbix.screens_items OK
        zabbix.service_alarms OK
        zabbix.services OK
        zabbix.services_links OK
        zabbix.sessions OK
        zabbix.stats OK
        zabbix.sysmaps OK
        zabbix.sysmaps_elements OK
        zabbix.sysmaps_links OK
        zabbix.trends OK
        zabbix.trigger_depends OK
        zabbix.triggers OK
        zabbix.users OK
        zabbix.users_groups OK
        zabbix.usrgrp OK
        zabbix.valuemaps OK
        [root@gw-app2 ~]#

        But regardless my server engine still crashes every few days.


        [root@gw-app2 ~]# tail -50 /tmp/zabbix_server.log
        026185:20060604:223501 SQL [select max(clock) from alarms where triggerid=12791 and clock<1149286052]
        026185:20060604:223501 Executing query:select max(clock) from alarms where triggerid=12791 and clock<1149286052
        026185:20060604:223501 SQL [select value from alarms where triggerid=12791 and clock=1149285480]
        026185:20060604:223501 Executing query:select value from alarms where triggerid=12791 and clock=1149285480
        026185:20060604:223501 In add_alarm(12791,0,0)
        026185:20060604:223501 In latest_alarm()
        026185:20060604:223501 SQL [select value from alarms where triggerid=12791 order by clock desc]
        026185:20060604:223501 Executing query:select value from alarms where triggerid=12791 order by clock desc
        026185:20060604:223501 Executing query:insert into alarms(triggerid,clock,value) values(12791,1149478501,0)
        026185:20060604:223501 In DBinsert_id()
        026185:20060604:223501 Executing query:update alerts set retries=3,error='Trigger changed its status. WIll not send repeats.' where triggerid=12791 and repeats>0 and status=0
        026185:20060604:223501 End of add_alarm()
        026185:20060604:223501 Executing query:update triggers set value=0,lastchange=1149478501,error='' where triggerid=12791
        026185:20060604:223501 In update_trigger_value. Before apply_actions. Triggerid [12791]
        026185:20060604:223501 In apply_actions(triggerid:12791,alarmid:221,trigger_ value:0)
        026185:20060604:223501 Applying actions
        026185:20060604:223501 Executing query:select actionid,userid,delay,subject,message,recipient,ma xrepeats,repeatdelay,scripts,actiontype from actions where nextcheck<=1149478501 and status=0
        026185:20060604:223501 Query::select actionid,userid,delay,subject,message,recipient,ma xrepeats,repeatdelay,scripts,actiontype from actions where nextcheck<=1149478501 and status=0
        026185:20060604:223501 Query failed:Unknown column 'delay' in 'field list' [1054]
        026174:20060604:223501 One server process died. Shutting down...
        026174:20060604:223501 0. Killing PID=[26176]
        026174:20060604:223501 1. Killing PID=[26178]
        026174:20060604:223501 2. Killing PID=[26180]
        026182:20060604:223501 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026174:20060604:223501 3. Killing PID=[26182]
        026183:20060604:223501 Server [5]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026174:20060604:223501 4. Killing PID=[26183]
        026184:20060604:223501 Server [6]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026174:20060604:223501 5. Killing PID=[26184]
        026174:20060604:223501 6. Killing PID=[26185]
        026174:20060604:223501 7. Killing PID=[26186]
        026174:20060604:223501 8. Killing PID=[26187]
        026174:20060604:223501 9. Killing PID=[26188]
        026174:20060604:223501 10. Killing PID=[26189]
        026174:20060604:223501 11. Killing PID=[26191]
        026174:20060604:223501 12. Killing PID=[26193]
        026174:20060604:223501 13. Killing PID=[26195]
        026174:20060604:223501 14. Killing PID=[26197]
        026176:20060604:223501 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026178:20060604:223501 Server [2]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026180:20060604:223501 Server [3]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026186:20060604:223501 Server [8]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026187:20060604:223501 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026188:20060604:223501 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026189:20060604:223501 Server [11]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026191:20060604:223501 Server [12]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026193:20060604:223501 Server [13]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026195:20060604:223501 Server [14]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026197:20060604:223501 Server [15]. Got QUIT or INT or TERM or PIPE signal. Exiting...
        026174:20060604:223501 ZABBIX server is down.
        [root@gw-app2 ~]#

        Comment

        • rondeniable
          Junior Member
          • Jun 2006
          • 14

          #5
          Looks like my actions table is missing a field.

          026185:20060604:223501 Query failed:Unknown column 'delay' in 'field list' [1054]

          Comment

          • rondeniable
            Junior Member
            • Jun 2006
            • 14

            #6
            Ok in my case it looks like in the beta12 patch the delay field was removed. But it looks like there is an sql call to the database that still is using the delay field.

            This looks to me to be like a bug.

            Comment

            • dantheman
              Senior Member
              • May 2006
              • 209

              #7
              I had an item enabled for zabbixw32 agent for polling an event log using an active agent.. (my only active agent item) apparently that being enabled is what was making my server crash.. after I disabled that. now it's up and running again.

              Comment

              • Eugene
                Member
                • Feb 2006
                • 57

                #8
                Originally posted by rondeniable
                Ok in my case it looks like in the beta12 patch the delay field was removed. But it looks like there is an sql call to the database that still is using the delay field.

                This looks to me to be like a bug.
                It is look like a old ZABBIX server is used.
                Try to recompile and reinstal ZABBIX server.

                Comment

                • rondeniable
                  Junior Member
                  • Jun 2006
                  • 14

                  #9
                  Ill be! Not sure how i did that...

                  [root@gw-app2 zabbix-1.1beta12]# /usr/local/zabbix/bin/zabbix_server --version
                  ZABBIX Server (daemon) v1.1beta11 (23 May 2006)
                  Compilation time: May 24 2006 00:09:44

                  Installed new version
                  [root@gw-app2 zabbix-1.1beta12]# /usr/local/zabbix/bin/zabbix_server --version
                  ZABBIX Server (daemon) v1.1beta12 (23 May 2006)
                  Compilation time: Jun 7 2006 19:01:16

                  That fixed it... thanks

                  Comment

                  Working...