Ad Widget

Collapse

Zabbix hang every 1 or 2 minutes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pierre-hoffmann
    Senior Member
    • Jan 2008
    • 133

    #1

    Zabbix hang every 1 or 2 minutes

    Hi,

    I've a big problem my zabbix server hang every 1 ou 2 minutes after startup, MySQL work fine, i'm not seeing why ... Please Help me !

    Code:
      9247:20080804:171208 Parameter [en0.ifInOctets.1] will be checked after 120 seconds on host [uxopas01]
      9260:20080804:171210 Active parameter [cobol.compilation] is not supported by agent on host [uxdist01]
      9260:20080804:171214 Active parameter [vfs.fs.size[/archiv,free]] is not supported by agent on host [uxopas01]
      9254:20080804:171214 Active parameter [vfs.fs.size[/archiv,pfree]] is not supported by agent on host [uxopas01]
      9261:20080804:171214 Active parameter [vfs.fs.size[/dbdump,free]] is not supported by agent on host [uxopas01]
      9258:20080804:171214 Active parameter [vfs.fs.size[/dbdump,pfree]] is not supported by agent on host [uxopas01]
      9255:20080804:171214 Active parameter [vfs.fs.size[/dbwork,free]] is not supported by agent on host [uxopas01]
      9259:20080804:171214 Active parameter [vfs.fs.size[/dbwork,pfree]] is not supported by agent on host [uxopas01]
      9254:20080804:171217 Active parameter [perf_counter[\System\threads]] is not supported by agent on host [sv_eram]
      9259:20080804:171217 Active parameter [perf_counter[\System\threads]] is not supported by agent on host [ntseip01]
      9245:20080804:171218 One child process died. Exiting ...
      9245:20080804:171220 ZABBIX Server stopped
    So i've try to enable debug:

    Code:
      6379:20080804:165039 In add_history()
      6374:20080804:165039 Query [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']
      6379:20080804:165039 Query [insert into history (clock,itemid,value) values (1217861439,24572,0.000000)]
      6390:20080804:165039 In delete_history(history_uint,18839,5230675030089662495,-1211101196)
      6390:20080804:165039 Query [select min(clock) from history_uint where itemid=18839]
      6379:20080804:165039 In add_trend()
      6379:20080804:165039 Query [select num,value_min,value_avg,value_max from trends where itemid=24572 and clock=1217858400]
      6375:20080804:165039 In add_history(key:sys.uptime,value_type:2,type:4)
      6375:20080804:165039 In add_history(itemid:25707,STRING:31 days)
      6375:20080804:165039 In add_history_log()
      6390:20080804:165039 In delete_history(history_str,18839,5230675030089662495,-1211101196)
      6390:20080804:165039 Query [select min(clock) from history_str where itemid=18839]
      6374:20080804:165039 Query [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']
      6390:20080804:165039 In delete_history(history_text,18839,5230675030089662495,-1211101196)
      6390:20080804:165039 Query [select min(clock) from history_text where itemid=18839]
      6379:20080804:165039 Query [update trends set num=61, value_min=0.000000, value_avg=0.852492, value_max=6.000000 where itemid=24572 and clock=1217858400]
      6390:20080804:165039 In delete_history(history_log,18839,5230675030089662495,-1211101196)
      6374:20080804:165039 Query [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
      6390:20080804:165039 Query [select min(clock) from history_log where itemid=18839]
      6358:20080804:165039 One child process died. Exiting ...
      6382:20080804:165039 Trapper got [<req><host>dXhkZXZ0MDM=</host><key>dmZzLmZpbGUuY2tzdW1bL2V0Yy9ncm91cF0=</key><data>MjUxNDc3Mjc3OA==</data></req>] len 112
      6382:20080804:165039 XML received [<req><host>dXhkZXZ0MDM=</host><key>dmZzLmZpbGUuY2tzdW1bL2V0Yy9ncm91cF0=</key><data>MjUxNDc3Mjc3OA==</data></req>]
      6382:20080804:165039 Value [2514772778]
      6382:20080804:165039 Query [begin;]
      6379:20080804:165039 End of add_history
      6390:20080804:165039 In delete_history(trends,18839,5230675030089662829,-1211101196)
      6379:20080804:165039 In update_item()
      6390:20080804:165039 Query [select min(clock) from trends where itemid=18839]
      6382:20080804:165039 In process_data([uxdevt03],[vfs.file.cksum[/etc/group]],[2514772778],[])
      6374:20080804:165039 Query [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']
      6382:20080804:165039 Query [select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex,h.dns from hosts h, items i where h.status=0 and h.hostid=i.hostid and h.host='uxdevt03' and i.key_='vfs.file.cksum[/etc/group]' and i.status in (0,3) and i.type in (2,7) and h.hostid>=100000000000000*0 and h.hostid<=(100000000000000*0+99999999999999) ]
      6379:20080804:165039 In calculate_item_nextcheck (24572,30,,1217861439)
      6390:20080804:165039 Query [delete from trends where itemid=18839 and clock<1186325434]
      6379:20080804:165039 End calculate_item_nextcheck (result:1217861462)
      6390:20080804:165039 In delete_history(history,18840,5230675030089662471,-1211101196)
      6379:20080804:165039 Query [update items set nextcheck=1217861462,prevvalue=lastvalue,lastvalue='0.000000',lastclock=1217861439 where itemid=24572]
      6390:20080804:165039 Query [select min(clock) from history where itemid=18840]
      6361:20080804:165039 Got signal. Exiting ...
      6374:20080804:165039 1317926
      6390:20080804:165039 In delete_history(history_uint,18840,5230675030089662471,-1211101196)
      6374:20080804:165039 Query [insert into events(eventid,source,object,objectid,clock,value) values(1317926,0,0,18269,1217861439,0)]
      6390:20080804:165039 Query [select min(clock) from history_uint where itemid=18840]
      6374:20080804:165039 Query [update services set status=0 where triggerid=18269]
      6390:20080804:165039 In delete_history(history_str,18840,5230675030089662471,-1211101196)
      6374:20080804:165039 Query [select serviceid,algorithm from services where triggerid=18269]
      6382:20080804:165039 In check_security()
      6390:20080804:165039 Query [select min(clock) from history_str where itemid=18840]
      6382:20080804:165039 Processing [2514772778]
      6382:20080804:165039 In process_new_value(vfs.file.cksum[/etc/group])
      6382:20080804:165039 In add_history(key:vfs.file.cksum[/etc/group],value_type:3,type:1)
      6382:20080804:165039 In add_history(itemid:22041,UINT64:2514772778)
      6382:20080804:165039 In add_history_uint()
      6382:20080804:165039 Query [insert into history_uint (clock,itemid,value) values (1217861439,22041,2514772778)]
      6374:20080804:165039 End of process_event()
      6364:20080804:165039 Got signal. Exiting ...
      6374:20080804:165039 Event processed OK
      6374:20080804:165039 End update_trigger_value()
      6374:20080804:165039 End update_triggers [28426]
      6390:20080804:165039 Query [delete from history_str where itemid=18840 and clock<1217256634]
      6374:20080804:165039 Query [commit;]
      6362:20080804:165039 Got signal. Exiting ...
      6379:20080804:165039 End update_item()
      6363:20080804:165039 Got signal. Exiting ...
      6379:20080804:165039 In update_functions(24572)
      6382:20080804:165039 In add_trend()
      6379:20080804:165039 Query [select distinct function,parameter,itemid,lastvalue from functions where itemid=24572]
      6382:20080804:165039 Query [select num,value_min,value_avg,value_max from trends where itemid=22041 and clock=1217858400]
      6365:20080804:165039 Got signal. Exiting ...
      6379:20080804:165039 ItemId:24572 Evaluating nodata(300)
      6367:20080804:165039 Got signal. Exiting ...
      6366:20080804:165039 Got signal. Exiting ...
      6382:20080804:165039 Query [update trends set num=4, value_min=2514772778.000000, value_avg=2514772778.000000, value_max=2514772778.000000 where itemid=22041 and clock=1217858400]
      6374:20080804:165039 After write()
      6371:20080804:165039 Got signal. Exiting ...
      6370:20080804:165039 Got signal. Exiting ...
      6382:20080804:165039 End of add_history
      6382:20080804:165039 In update_item()
      6379:20080804:165039 In evaluate_function(nodata)
      6368:20080804:165039 Got signal. Exiting ...
      6379:20080804:165039 In evaluate_NODATA()
      6374:20080804:165039 Got signal. Exiting ...
      6382:20080804:165039 In calculate_item_nextcheck (22041,600,,1217861439)
      6382:20080804:165039 End calculate_item_nextcheck (result:1217862039)
      6382:20080804:165039 Query [update items set nextcheck=1217862039,prevvalue=lastvalue,lastvalue='2514772778',lastclock=1217861439 where itemid=22041]
      6372:20080804:165039 Got signal. Exiting ...
      6384:20080804:165039 Got signal. Exiting ...
      6376:20080804:165039 Got signal. Exiting ...
      6389:20080804:165039 Got signal. Exiting ...
      6387:20080804:165039 Got signal. Exiting ...
      6390:20080804:165039 In delete_history(history_text,18840,5230675030089662471,-1211101196)
      6390:20080804:165039 Query [select min(clock) from history_text where itemid=18840]
      6379:20080804:165039 End of evaluate_NODATA()
      6392:20080804:165039 Got signal. Exiting ...
      6379:20080804:165039 Got signal. Exiting ...
      6382:20080804:165039 Got signal. Exiting ...
      6390:20080804:165039 Got signal. Exiting ...
      6394:20080804:165039 Got signal. Exiting ...
      6400:20080804:165039 Got signal. Exiting ...
      6401:20080804:165039 Got signal. Exiting ...
      6395:20080804:165039 Got signal. Exiting ...
      6404:20080804:165039 Got signal. Exiting ...
      6358:20080804:165041 ZABBIX Server stopped
    Best regards,
    Pierre.
    P.Hoffmann
    System & Network Admin.
    __________________________
    Zabbix version 1.8.1
    Hosts monitored 1300
    OS Novell SLES 10 SP2
    __________________________
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Please consider upgrading to 1.4.6, which contains fixes for one or two crash related problems. Not sure of this is your case though...
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • pierre-hoffmann
      Senior Member
      • Jan 2008
      • 133

      #3
      Hi,

      (Thanks)

      Since yesterday evening zabbix has hang 433 times...
      I've an automatic restart script.

      So my question, is why there's nothing in zabbix server logs
      to diagnostic what happend when zabbix dies ?

      I'm trying 1.4.6, but i'm affraid by this thread :
      http://www.zabbix.com/forum/showthread.php?t=10169. (replace a problem by another ...)

      Regards,
      P.Hoffmann
      System & Network Admin.
      __________________________
      Zabbix version 1.8.1
      Hosts monitored 1300
      OS Novell SLES 10 SP2
      __________________________

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        I would appreciate if you could send full log file to s u p p o r t @ z a b b i x . c o m. Thank you.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • pierre-hoffmann
          Senior Member
          • Jan 2008
          • 133

          #5
          Hi,

          i've send you logs files.

          But i've seen when
          Code:
          DebugLevel=4
          zabbix hang after 30 seconds, and when
          Code:
          DebugLevel=3
          after only 5 minutes.

          Why ?? Is my problem link with logfile creation ??

          Best regards,
          Pierre.
          Last edited by pierre-hoffmann; 05-08-2008, 11:07.
          P.Hoffmann
          System & Network Admin.
          __________________________
          Zabbix version 1.8.1
          Hosts monitored 1300
          OS Novell SLES 10 SP2
          __________________________

          Comment

          • pierre-hoffmann
            Senior Member
            • Jan 2008
            • 133

            #6
            Hi,

            After a lots of test i've find this:
            • i've set loglevel to 2 (Error)
            • Only 3 host generate error
              Code:
                 484:20080805:110620 Timeout while connecting to [uxopas01:161]
                 485:20080805:110622 Timeout while connecting to [uxowms02:161]
                 485:20080805:110632 Timeout while connecting to [uxowms01:161]
                 483:20080805:110805 Timeout while connecting to [uxopas01:161]
                 484:20080805:110817 Timeout while connecting to [uxowms01:161]
                 486:20080805:110822 Timeout while connecting to [uxowms02:161]
                 484:20080805:110832 Timeout while connecting to [uxopas01:161]
                 485:20080805:110840 Timeout while connecting to [uxowms01:161]
                 485:20080805:110850 Timeout while connecting to [uxowms02:161]
                 484:20080805:111038 Timeout while connecting to [uxowms01:161]
                 483:20080805:111042 Timeout while connecting to [uxopas01:161]
                 486:20080805:111049 Timeout while connecting to [uxowms02:161]
                 484:20080805:111057 Timeout while connecting to [uxopas01:161]
                 514:20080805:111100 Timeout while connecting to [uxowms01:161]
                 481:20080805:111110 ZABBIX Server stopped
                1530:20080805:115204 Timeout while connecting to [uxowms02:161]
                1529:20080805:115243 Timeout while connecting to [uxowms01:161]
                1529:20080805:115249 Timeout while connecting to [uxopas01:161]
                1528:20080805:115249 Timeout while connecting to [uxopas01:161]
                1531:20080805:115302 Timeout while connecting to [uxowms02:161]
                1530:20080805:115320 Timeout while connecting to [uxowms01:161]
                1530:20080805:115429 Timeout while connecting to [uxowms02:161]
                1529:20080805:115501 Timeout while connecting to [uxowms01:161]
                1529:20080805:115510 Timeout while connecting to [uxopas01:161]
                1528:20080805:115534 Timeout while connecting to [uxopas01:161]
                1531:20080805:115538 Timeout while connecting to [uxowms02:161]
                1530:20080805:115552 Timeout while connecting to [uxowms01:161]
                1530:20080805:115641 Timeout while connecting to [uxowms02:161]
                1522:20080805:115655 ZABBIX Server stopped
            • I've disable this hosts


            And Zabbix work....

            What's the problem with SNMP ??
            Port 161 (SNMP) on this servers was not open (snmp not started); but why this cause zabbix crash ???

            Regards,
            Pierre.
            Last edited by pierre-hoffmann; 05-08-2008, 12:40.
            P.Hoffmann
            System & Network Admin.
            __________________________
            Zabbix version 1.8.1
            Hosts monitored 1300
            OS Novell SLES 10 SP2
            __________________________

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              I have absolutely no idea if the SNMP timeouts may cause these problems. This is very common functionality used by thousands of users around, so I doubt there is a SNMP related problem in ZABBIX code. Very strange indeed...

              Restarting of ZABBIX hundreds of times per day is not an option, I would suggest trying using a different platform as this can also be an issue related to faulty net-snmp libs, glibc, whatever.
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              Working...