Ad Widget

Collapse

zabbix_server crashes after 30 min

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rreilly
    Member
    • May 2007
    • 61

    #1

    zabbix_server crashes after 30 min

    Hi, I compiled and installed zabbix 1.3.7, and it the zabbix_server dies after about 30 to 45 min.
    env:
    suse 10.2
    Pentium4 2.8Ghz
    2GB Ram
    only monitoring 4 or 5 hosts

    here is the log entry
    5516:20070516:102305 server #16 started [Poller for unreachable hosts. SNMP:ON]
    5521:20070516:102305 server #19 started [HTTP Poller]
    5526:20070516:102305 server #20 started [HTTP Poller]
    5528:20070516:102305 server #21 started [HTTP Poller]
    5530:20070516:102305 server #22 started [HTTP Poller]
    5532:20070516:102306 server #23 started [Discoverer]
    5480:20070516:102306 server #0 started [Watchdog]
    5510:20070516:102307 Deleted 0 records from history and trends
    5480:20070516:105305 One child process died. Exiting ...
    5480:20070516:105307 ZABBIX Server stopped
  • rreilly
    Member
    • May 2007
    • 61

    #2
    I have cranked up the debug will post here, when it dies again..
    Rob

    Comment

    • rreilly
      Member
      • May 2007
      • 61

      #3
      added debug log

      nothing obvious to me...
      21416:20070516:111644 End of add_history
      21416:20070516:111644 In update_item()
      21416:20070516:111644 In calculate_item_nextcheck (18659,5,,1179328604)
      21416:20070516:111644 End calculate_item_nextcheck (result:1179328609)
      21416:20070516:111644 Query [update items set nextcheck=1179328609,prevvalue=lastvalue,prevorgva lue='3953869790.000000',lastvalue='4342.600000',la stclock=1179328604 where itemid=18659]
      21416:20070516:111644 End update_item()
      21416:20070516:111644 In update_functions(18659)
      21416:20070516:111644 Query [select distinct function,parameter,itemid,lastvalue from functions where itemid=18659]
      21416:20070516:111644 End update_functions()
      21416:20070516:111644 In update_triggers [itemid:18659]
      21416:20070516:111644 Query [select distinct t.triggerid,t.expression,t.status,t.dep_level,t.pr iority,t.value,t.description from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=18659]
      21416:20070516:111644 End update_triggers [18659]
      21416:20070516:111644 Query [commit;]
      21410:20070516:111644 End get_value_agent(result:584592)
      21410:20070516:111644 End get_value()
      21410:20070516:111644 Query [begin;]
      21410:20070516:111644 In process_new_value(vfs.fs.size[/tmp,free])
      21410:20070516:111644 In add_history(vfs.fs.size[/tmp,free],,3,1)
      21410:20070516:111644 In add_history(18734,UINT64:2571062806019309568)
      21410:20070516:111644 In add_history_uint()
      21410:20070516:111644 Query [insert into history_uint (clock,itemid,value) values (1179328604,18734,598622208)]
      21410:20070516:111644 In add_trend()
      21410:20070516:111644 Query [select num,value_min,value_avg,value_max from trends where itemid=18734 and clock=1179327600]
      21410:20070516:111644 Query [update trends set num=19, value_min=597594112.000000, value_avg=597702332.631537, value_max=598622208.000000 where itemid=18734 and clock=1179327600]
      21410:20070516:111644 End of add_history
      21410:20070516:111644 In update_item()
      21410:20070516:111644 In calculate_item_nextcheck (18734,30,,1179328604)
      21410:20070516:111644 End calculate_item_nextcheck (result:1179328634)
      21410:20070516:111644 Query [update items set nextcheck=1179328634,prevvalue=lastvalue,lastvalue ='598622208',lastclock=1179328604 where itemid=18734]
      21410:20070516:111644 End update_item()
      21410:20070516:111644 In update_functions(18734)
      21410:20070516:111644 Query [select distinct function,parameter,itemid,lastvalue from functions where itemid=18734]
      21410:20070516:111644 End update_functions()
      21410:20070516:111644 In update_triggers [itemid:18734]
      21410:20070516:111644 Query [select distinct t.triggerid,t.expression,t.status,t.dep_level,t.pr iority,t.value,t.description from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=18734]
      21410:20070516:111644 End update_triggers [18734]
      21410:20070516:111644 Query [commit;]
      21416:20070516:111644 In get_value()
      21416:20070516:111644 In get_value_agent(host:appP1.nyc1.xxxxxx.com,ip:0.0. 0.0,key:net.if.in[eth0,bytes]
      21410:20070516:111644 In get_value()
      21410:20070516:111644 In get_value_agent(host:appP1.nyc1.xxxxxx.com,ip:0.0. 0.0,key:vfs.fs.size[/opt,pused]
      21405:20070516:111644 One child process died. Exiting ...
      21408:20070516:111644 Got signal. Exiting ...
      21416:20070516:111644 Sending [net.if.in[eth0,bytes]
      ]
      21416:20070516:111644 Before read
      21410:20070516:111644 Sending [vfs.fs.size[/opt,pused]
      ]
      21410:20070516:111644 Before read
      21409:20070516:111644 Got signal. Exiting ...
      21410:20070516:111644 Got signal. Exiting ...
      21416:20070516:111644 End get_value_agent(result:271648510)
      21416:20070516:111644 End get_value()
      21416:20070516:111644 Query [begin;]
      21416:20070516:111644 In process_new_value(net.if.in[eth0,bytes])
      21416:20070516:111644 In add_history(net.if.in[eth0,bytes],,0,2)
      21416:20070516:111644 In add_history(18869,DOUBLE:-83711609936427134449095706957812641450109750914494 81308154299909143367586913563456978112334497623891 62183338216838395957177457254447120346561295123023 32615655738810740814304573602145352049774545921517 04807067558580923391615155287155598081207872705402 0087472481926110684847108059786128022165669281792. 000000)
      21416:20070516:111644 In add_history()
      21416:20070516:111644 Query [insert into history (clock,itemid,value) values (1179328604,18869,16947.600000)]
      21416:20070516:111644 In add_trend()
      21416:20070516:111644 Query [select num,value_min,value_avg,value_max from trends where itemid=18869 and clock=1179327600]
      21416:20070516:111644 Query [update trends set num=109, value_min=1292.800000, value_avg=6635.006378, value_max=44207.600000 where itemid=18869 and clock=1179327600]
      21416:20070516:111644 End of add_history
      21416:20070516:111644 In update_item()
      21416:20070516:111644 In calculate_item_nextcheck (18869,5,,1179328604)
      21416:20070516:111644 End calculate_item_nextcheck (result:1179328609)
      21416:20070516:111644 Query [update items set nextcheck=1179328609,prevvalue=lastvalue,prevorgva lue='271648510.000000',lastvalue='16947.600000',la stclock=1179328604 where itemid=18869]
      21416:20070516:111644 End update_item()
      21416:20070516:111644 In update_functions(18869)
      21416:20070516:111644 Query [select distinct function,parameter,itemid,lastvalue from functions where itemid=18869]
      21416:20070516:111644 End update_functions()
      21416:20070516:111644 In update_triggers [itemid:18869]
      21416:20070516:111644 Query [select distinct t.triggerid,t.expression,t.status,t.dep_level,t.pr iority,t.value,t.description from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=18869]
      21416:20070516:111644 End update_triggers [18869]
      21416:20070516:111644 Query [commit;]
      21411:20070516:111644 Got signal. Exiting ...
      21416:20070516:111644 End get_values()
      21416:20070516:111644 Spent 0 seconds while updating values
      21416:20070516:111644 Query [select count(*),min(nextcheck) from items i,hosts h where h.status=0 and h.disable_until<1179328604 and h.errors_from=0 and h.hostid=i.hostid and i.status in (0,3) and i.type not in (2,7,9) and mod(i.itemid,6)=5 and i.key_ not in ('status','icmpping','icmppingsec','zabbix[log]') and h.hostid>=100000000000000*0 and h.hostid<=(100000000000000*0+99999999999999) ]
      21416:20070516:111644 Nextcheck:1179328607 Time:1179328604
      21416:20070516:111644 Sleeping for 3 seconds
      21415:20070516:111644 Got signal. Exiting ...
      21416:20070516:111644 Got signal. Exiting ...
      21419:20070516:111644 Got signal. Exiting ...
      21421:20070516:111644 Got signal. Exiting ...
      21428:20070516:111645 Got signal. Exiting ...
      21441:20070516:111645 Got signal. Exiting ...
      21444:20070516:111645 Got signal. Exiting ...
      21445:20070516:111645 Got signal. Exiting ...
      21447:20070516:111645 Got signal. Exiting ...
      21449:20070516:111645 Got signal. Exiting ...
      21426:20070516:111645 Got signal. Exiting ...
      21430:20070516:111645 Got signal. Exiting ...
      21433:20070516:111645 Got signal. Exiting ...
      21435:20070516:111645 Got signal. Exiting ...
      21439:20070516:111645 Got signal. Exiting ...
      21442:20070516:111645 Got signal. Exiting ...
      21450:20070516:111645 Got signal. Exiting ...
      21417:20070516:111645 Got signal. Exiting ...
      21405:20070516:111647 ZABBIX Server stopped

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        Please may I ask you to test the latest code from svn://svn.zabbix.com/trunk ? We fixed a memory corruption in server code recently.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • rreilly
          Member
          • May 2007
          • 61

          #5
          Alexei, that appears to have worked ! thanks, I have found another issue
          when in configuration->Items
          there is a group drop down for "all" but not for host (i think this was the case in the past)
          how ever if I go to triggers and set "all" "all" then navigate to the items section I get all items..
          Rob

          Comment

          • cperera
            Junior Member
            • May 2007
            • 19

            #6
            Running zabbix 1.3.8 and the server crashes almost immediately

            Here's the interesting part of the log file. What types of attachments does the forum accept, it won't let me attach it (tried txt, gz, tar.gz).

            13146:20070516:164353 End process_value(result:FAIL)
            13146:20070516:164353 End process_step_data()
            13146:20070516:164353 out of memory. requested '-1211838143' bytes.
            13149:20070516:164353 server #23 started [Discoverer]
            13109:20070516:164353 server #0 started [Watchdog]
            13114:20070516:164353 In get_value()
            13114:20070516:164353 In get_value_agent(host:webP2.nyc1.freshdirect.com,ip :0.0.0.0,key:vfs.fs.inode[/,pfree]
            13109:20070516:164353 One child process died. Exiting ...
            13114:20070516:164353 Got signal. Exiting ...

            Managed to get the server started after a few tries though.

            Comment

            • rreilly
              Member
              • May 2007
              • 61

              #7
              Alexei, The problem that Charith is mentioning above is after we have checked out the latest version from svn... out of memory error...
              Rob

              Comment

              Working...