Ad Widget

Collapse

Zabbix server suddenly died

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sphaero
    Junior Member
    • Nov 2005
    • 11

    #1

    Zabbix server suddenly died

    My zabbix server has suddenly died... dunno what went wrong, mysql is still running and shows no errors. Running on debian Sarge.

    Code:
    020083:20060727:142512 In add_history_uint()
    020083:20060727:142512 Executing query:insert into history_uint (clock,itemid,value) values (1154003112,17480,5785)
    020083:20060727:142512 In add_trend()
    020083:20060727:142512 SQL [select num,value_min,value_avg,value_max from trends where itemid=17480 and clock=1154001600]
    020083:20060727:142512 Executing query:select num,value_min,value_avg,value_max from trends where itemid=17480 and clock=1154001600
    020083:20060727:142512 Executing query:update trends set num=22, value_min=4740.000000, value_avg=5601.909018, value_max=6035.000000 where itemid=17480 and clock=1154001600
    020087:20060727:142512 Status send [0]
    020087:20060727:142512 In get_value_SNMP() 0.4
    020087:20060727:142512 In get_value_SNMP() 1
    020087:20060727:142512 In get_value_SNMP() 2
    020087:20060727:142512 AV loop()
    020087:20060727:142512 OID [IF-MIB::ifOutOctets.51] Type [65] UI64[2089708213]
    020087:20060727:142512 OID [IF-MIB::ifOutOctets.51] Type [65] ULONG[2089708213]
    020087:20060727:142512 In process_new_value()
    020087:20060727:142512 In add_history(IF-MIB::ifOutOctets.51,,3,1)
    020087:20060727:142512 In add_history(17481,UINT64:16717665704)
    020087:20060727:142512 ITEM_STORE_SPEED_PER_SECOND(IF-MIB::ifOutOctets.51,16717296144.000000,0.000000)
    020087:20060727:142512 In add_history_uint()
    020087:20060727:142512 Executing query:insert into history_uint (clock,itemid,value) values (1154003112,17481,3079)
    020087:20060727:142512 In add_trend()
    020087:20060727:142512 SQL [select num,value_min,value_avg,value_max from trends where itemid=17481 and clock=1154001600]
    020087:20060727:142512 Executing query:select num,value_min,value_avg,value_max from trends where itemid=17481 and clock=1154001600
    020087:20060727:142512 Executing query:update trends set num=23, value_min=2075.000000, value_avg=2842.521809, value_max=3419.000000 where itemid=17481 and clock=1154001600
    020083:20060727:142512 End of add_history
    020083:20060727:142512 In update_item()
    020083:20060727:142512 Executing query:update items set nextcheck=1154003160,prevvalue=lastvalue,prevorgvalue=10950064400.000000,lastvalue='5785.200000',lastclock=1154003112 where itemid=17480
    020087:20060727:142512 End of add_history
    020087:20060727:142512 In update_item()
    020087:20060727:142512 Executing query:update items set nextcheck=1154003160,prevvalue=lastvalue,prevorgvalue=16717665704.000000,lastvalue='3079.666667',lastclock=1154003112 where itemid=17481
    020083:20060727:142512 In update_functions(17480)
    020083:20060727:142512 Executing query:select distinct function,parameter,itemid,lastvalue from functions where itemid=17480
    020083:20060727:142512 Query::select distinct function,parameter,itemid,lastvalue from functions where itemid=17480
    020083:20060727:142512 Query failed:Can't create/write to file '/tmp/#sql_6437_0.MYI' (Errcode: 13) [1]
    020087:20060727:142512 In update_functions(17481)
    020087:20060727:142512 Executing query:select distinct function,parameter,itemid,lastvalue from functions where itemid=17481
    020087:20060727:142512 Query::select distinct function,parameter,itemid,lastvalue from functions where itemid=17481
    020087:20060727:142512 Query failed:Can't create/write to file '/tmp/#sql_6437_0.MYI' (Errcode: 13) [1]
    020088:20060727:142512 server #9 started [Poller. SNMP:ON]
    020088:20060727:142512 Executing query:select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid from hosts h, items i where i.nextcheck<=1154003112 and i.status in (0,3) and i.type not in (2,7) and h.status=0 and h.disable_until<=1154003112 and h.errors_from=0 and h.hostid=i.hostid and mod(i.itemid,6)=4 and i.key_ not in ('status','icmpping','icmppingsec','zabbix[log]') order by i.nextcheck
    020088:20060727:142512 Spent 0 seconds while updating values
    020088:20060727:142512 Executing query:select count(*),min(nextcheck) from items i,hosts h where h.status=0 and h.disable_until<1154003112 and h.errors_from=0 and h.hostid=i.hostid and i.status in (0,3) and i.type not in (2,7) and mod(i.itemid,6)=4 and i.key_ not in ('status','icmpping','icmppingsec','zabbix[log]')
    020088:20060727:142512 No items to update for minnextcheck.
    020088:20060727:142512 Nextcheck:-1 Time:1154003112
    020088:20060727:142512 Sleeping for 5 seconds
    020089:20060727:142512 server #10 started [Poller. SNMP:ON]
    020089:20060727:142512 Executing query:select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid from hosts h, items i where i.nextcheck<=1154003112 and i.status in (0,3) and i.type not in (2,7) and h.status=0 and h.disable_until<=1154003112 and h.errors_from=0 and h.hostid=i.hostid and mod(i.itemid,6)=5 and i.key_ not in ('status','icmpping','icmppingsec','zabbix[log]') order by i.nextcheck
    020089:20060727:142512 Spent 0 seconds while updating values
    020089:20060727:142512 Executing query:select count(*),min(nextcheck) from items i,hosts h where h.status=0 and h.disable_until<1154003112 and h.errors_from=0 and h.hostid=i.hostid and i.status in (0,3) and i.type not in (2,7) and mod(i.itemid,6)=5 and i.key_ not in ('status','icmpping','icmppingsec','zabbix[log]')
    020089:20060727:142512 No items to update for minnextcheck.
    020089:20060727:142512 Nextcheck:-1 Time:1154003112
    020089:20060727:142512 Sleeping for 5 seconds
    020094:20060727:142512 After DBconnect()
    020094:20060727:142512 Before accept()
    020071:20060727:142512 One server process died. Shutting down...
    020074:20060727:142512 Server [1]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 0. Killing PID=[20074]
    020071:20060727:142512 1. Killing PID=[20076]
    020078:20060727:142512 Server [3]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 2. Killing PID=[20078]
    020080:20060727:142512 Server [4]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 3. Killing PID=[20080]
    020081:20060727:142512 Server [5]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 4. Killing PID=[20081]
    020071:20060727:142512 5. Killing PID=[20082]
    020071:20060727:142512 6. Killing PID=[20083]
    020071:20060727:142512 7. Killing PID=[20087]
    020088:20060727:142512 Server [9]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 8. Killing PID=[20088]
    020071:20060727:142512 9. Killing PID=[20089]
    020090:20060727:142512 Server [11]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 10. Killing PID=[20090]
    020092:20060727:142512 Server [12]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 11. Killing PID=[20092]
    020094:20060727:142512 Server [13]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 12. Killing PID=[20094]
    020071:20060727:142512 Server [0]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020071:20060727:142512 0. Killing PID=[20074]
    020071:20060727:142512 1. Killing PID=[20076]
    020071:20060727:142512 2. Killing PID=[20078]
    020071:20060727:142512 3. Killing PID=[20080]
    020071:20060727:142512 4. Killing PID=[20081]
    020071:20060727:142512 5. Killing PID=[20082]
    020071:20060727:142512 6. Killing PID=[20083]
    020071:20060727:142512 7. Killing PID=[20087]
    020071:20060727:142512 8. Killing PID=[20088]
    020071:20060727:142512 9. Killing PID=[20089]
    020071:20060727:142512 10. Killing PID=[20090]
    020071:20060727:142512 11. Killing PID=[20092]
    020071:20060727:142512 12. Killing PID=[20094]
    020071:20060727:142512 13. Killing PID=[0]
    020071:20060727:142512 14. Killing PID=[0]
    020071:20060727:142512 ZABBIX server is down.
    020089:20060727:142512 Server [10]. Got QUIT or INT or TERM or PIPE signal. Exiting...
    020096:20060727:142512 Server [14]. Got QUIT or INT or TERM or PIPE signal. Exiting...
  • sphaero
    Junior Member
    • Nov 2005
    • 11

    #2
    That's what you get when you mess around on a server, permissions on /tmp were changed. fixed now.

    Comment

    • bbrendon
      Senior Member
      • Sep 2005
      • 870

      #3
      Zabbix server shutting down - 1.1.6

      Very strange. Tmp looks fine, disk looks fine. What gives?
      Code:
      drwxrwxrwt 50 root root 9144 2007-06-04 18:57 /tmp
      
      Filesystem            Size  Used Avail Use% Mounted on
      /dev/evms/tmp         3.0G  1.8G  1.3G  60% /tmp
      Code:
      008490:20070604:181145 Executing housekeeper
      008490:20070604:181150 Deleted 17536 records from history and trends
      008490:20070604:181150 Next housekeeper run is after 1h
      008498:20070604:184146 Query::select distinct t.triggerid,t.expression,t.description,t.status,t.priority,t.value,t.url,t.comments from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=21369
      008498:20070604:184146 Query failed:Can't create/write to file '/tmp/#sql_1442_0.MYD' (Errcode: 24) [1]
      008490:20070604:184146 One server process died. Shutting down...
      008490:20070604:184146 ZABBIX server is down.
      Unofficial Zabbix Expert
      Blog, Corporate Site

      Comment

      • bbrendon
        Senior Member
        • Sep 2005
        • 870

        #4
        Adding more fiels to /etc/security/limits.conf seems to be the answer
        Unofficial Zabbix Expert
        Blog, Corporate Site

        Comment

        • NOB
          Senior Member
          Zabbix Certified Specialist
          • Mar 2007
          • 469

          #5
          Originally posted by infinity005
          Adding more fiels to /etc/security/limits.conf seems to be the answer
          Sure, error code 24 means Too many open files.

          Regards,

          Norbert.

          Comment

          Working...