Ad Widget

Collapse

zabbix server crash

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jarek
    Member
    • May 2005
    • 35

    #1

    zabbix server crash

    Hi all!
    I've brand new installation of zabbix 1.6.1. Today I was playing with IPMI, and I've found that in case when the IPMI device was not accessible (mistake in IP), server crashed. I had no time to deeply analyze this, but it happen few times, and gone, when I removed problematic items.
    More details tomorrow.

    regards
    Jarek
  • jarek
    Member
    • May 2005
    • 35

    #2
    Zabbix crash - more details...

    It looks, that zabbix_server has problem with handling of disconnected IPMI devices. When some ipmi device will disappear from network,
    server dies:

    3627:20081218:013331 In substitute_simple_macros (data:"host1.ipmi.abc.local")
    3627:20081218:013331 End substitute_simple_macros (result:host1.ipmi.abc.local)
    3627:20081218:013331 In int_in_list(list:,value:10062)
    3627:20081218:013331 End int_in_list(ret:FAIL)
    3627:20081218:013331 In get_value(key
    3627:20081218:013331 In get_value_ipmi(key
    3627:20081218:013331 In init_ipmi_host([host1.ipmi.abc.local]:623)
    3627:20081218:013331 In get_ipmi_host([host1.ipmi.abc.local]:623)
    3627:20081218:013331 In get_ipmi_sensor_by_name() Fan 1 Tach@[host1.ipmi.abc.local]:623
    3627:20081218:013331 In read_ipmi_sensor() Fan 1 Tach@[host1.ipmi.abc.local]:623
    3627:20081218:013331 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
    3627:20081218:013331 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
    3627:20081218:013331 In setup_done() [host2.ipmi.abc.local]:623
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Fan 1 Tach@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Planar 1.5V@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Planar 1.8V @[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Planar 12V @[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Planar 5V @[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Planar 3.3V @[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor RSA II Detect0@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Ambient Temp@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor CPU OverTemp0@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor CPU PFA0@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Fan 5 Tach@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor SEL Fullness@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor CPU Vtt@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Fan 2 Tach@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Fan 3 Tach@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In sensor_change()
    3627:20081218:013331 In delete_ipmi_sensor()
    3627:20081218:013331 Sensor Fan 4 Tach@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In control_change()
    3627:20081218:013331 In delete_ipmi_control()
    3627:20081218:013331 Control power@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In control_change()
    3627:20081218:013331 In delete_ipmi_control()
    3627:20081218:013331 Control reset@[host2.ipmi.abc.local]:623 deleted
    3627:20081218:013331 In entity_change()
    3627:20081218:013331 In domain_closed() [host2.ipmi.abc.local]:623
    3627:20081218:013331 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
    3627:20081218:013331 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
    3578:20081218:013331 One child process died. Exiting ...
    3587:20081218:013331 Got signal. Exiting ...
    3589:20081218:013331 Got signal. Exiting ...
    3593:20081218:013331 Got signal. Exiting ...
    3590:20081218:013331 Got signal. Exiting ...
    3599:20081218:013331 Got signal. Exiting ...
    3595:20081218:013331 Got signal. Exiting ...
    3619:20081218:013331 Got signal. Exiting ...
    3597:20081218:013331 Got signal. Exiting ...
    3601:20081218:013331 Got signal. Exiting ...
    3628:20081218:013331 Got signal. Exiting ...
    3604:20081218:013331 Got signal. Exiting ...
    3585:20081218:013331 Got signal. Exiting ...
    3615:20081218:013331 Got signal. Exiting ...
    3581:20081218:013331 Got signal. Exiting ...
    3610:20081218:013331 Got signal. Exiting ...
    3580:20081218:013331 Got signal. Exiting ...
    3613:20081218:013331 Got signal. Exiting ...
    3617:20081218:013331 Got signal. Exiting ...
    3583:20081218:013331 Got signal. Exiting ...
    3623:20081218:013331 Got signal. Exiting ...
    3621:20081218:013331 Got signal. Exiting ...
    3607:20081218:013331 Got signal. Exiting ...
    3611:20081218:013331 Got signal. Exiting ...
    3602:20081218:013331 Got signal. Exiting ...
    3625:20081218:013331 Got signal. Exiting ...
    3579:20081218:013331 Got signal. Exiting ...
    3631:20081218:013331 Got signal. Exiting ...
    3578:20081218:013333 Query [SET CHARACTER SET utf8]
    3578:20081218:013333 In free_ipmi_handler()
    3578:20081218:013333 ZABBIX Server stopped. ZABBIX 1.6.1.

    Comment

    • oddie
      Junior Member
      • Oct 2008
      • 11

      #3
      I think that my problem is related to this aswell:

      Comment

      • Antras
        Junior Member
        • Oct 2007
        • 12

        #4
        Have the same issue with zabbix 1.6.5. Ipmi agents kill zabbix-server.
        All system and ipmi libraries are updated (Centos 5.3, Intel S5000PAL servers)

        Comment

        • lukus
          Junior Member
          • May 2008
          • 9

          #5
          I have the very same problem with version 1.8.1. IPMI is unusable if the server crashes anytime a device is temporarily unreachable!

          Code:
           14512:20100223:095817.040 Starting zabbix_server. Zabbix 1.8.1 (revision 9702).
           14512:20100223:095817.040 **** Enabled features ****
           14512:20100223:095817.040 SNMP monitoring:       YES
           14512:20100223:095817.040 IPMI monitoring:       YES
           14512:20100223:095817.041 WEB monitoring:        YES
           14512:20100223:095817.041 Jabber notifications:  YES
           14512:20100223:095817.041 ODBC:                  YES
           14512:20100223:095817.041 SSH2 support:           NO
           14512:20100223:095817.041 IPv6 support:          YES
          ...
             418:20100222:180706.860 One child process died (PID:443). Exiting ...
          And here is the output from PID:443:

          Code:
             443:20100222:180626.462 In get_values()
             443:20100222:180626.462 In DCinit_nextchecks()
             443:20100222:180626.462 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180626.462 End of DCconfig_get_poller_items():0
             443:20100222:180626.462 In DCflush_nextchecks()
             443:20100222:180626.462 End of get_values()
             443:20100222:180626.463 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180626.463 End of DCconfig_get_normal_poller_nextcheck():1266890796
             443:20100222:180626.463 Poller #3 spent 0.000424 seconds while updating   0 values. Sleeping for 5 seconds
             443:20100222:180631.463 In get_values()
             443:20100222:180631.463 In DCinit_nextchecks()
             443:20100222:180631.463 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180631.463 End of DCconfig_get_poller_items():0
             443:20100222:180631.463 In DCflush_nextchecks()
             443:20100222:180631.463 End of get_values()
             443:20100222:180631.463 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180631.463 End of DCconfig_get_normal_poller_nextcheck():1266890796
             443:20100222:180631.463 Poller #3 spent 0.000407 seconds while updating   0 values. Sleeping for 5 seconds
             443:20100222:180636.464 In get_values()
             443:20100222:180636.464 In DCinit_nextchecks()
             443:20100222:180636.464 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180636.464 End of DCconfig_get_poller_items():1
             443:20100222:180636.464 In substitute_simple_macros (data:'ambient.temperature')
             443:20100222:180636.464 In substitute_simple_macros (data:'node063-lom.cluster.zymeworks.com')
             443:20100222:180636.464 In get_value() key:'ambient.temperature'
             443:20100222:180636.464 In get_value_ipmi(key:ambient.temperature)
             443:20100222:180636.464 In init_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180636.464 In get_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180636.464 In get_ipmi_sensor_by_name() Ambient Temp@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180636.464 In read_ipmi_sensor() Ambient Temp@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180636.527 In got_thresh_reading()
             443:20100222:180636.527 In get_ipmi_sensor()
             443:20100222:180636.527 Value [Ambient Temp | front_panel_board | temperature | threshold | 16.000000 C]
             443:20100222:180636.527 End of get_value():SUCCEED
             443:20100222:180636.527 In calculate_item_nextcheck (22476,60,"",1266890796)
             443:20100222:180636.527 End calculate_item_nextcheck (result:1266890856)
             443:20100222:180636.527 In DCflush_nextchecks()
             443:20100222:180636.528 End of get_values()
             443:20100222:180636.528 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180636.528 End of DCconfig_get_normal_poller_nextcheck():1266890797
             443:20100222:180636.528 Poller #3 spent 0.064082 seconds while updating   1 values. Sleeping for 1 seconds
             443:20100222:180637.528 In get_values()
             443:20100222:180637.529 In DCinit_nextchecks()
             443:20100222:180637.529 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180637.529 End of DCconfig_get_poller_items():1
             443:20100222:180637.529 In substitute_simple_macros (data:'fan_cooling[1]')
             443:20100222:180637.529 In substitute_simple_macros (data:'node063-lom.cluster.zymeworks.com')
             443:20100222:180637.529 In get_value() key:'fan_cooling[1]'
             443:20100222:180637.529 In get_value_ipmi(key:fan_cooling[1])
             443:20100222:180637.529 In init_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180637.529 In get_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180637.529 In get_ipmi_sensor_by_name() Fan 1 Tach@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180637.529 In read_ipmi_sensor() Fan 1 Tach@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180637.592 In got_thresh_reading()
             443:20100222:180637.592 In get_ipmi_sensor()
             443:20100222:180637.592 Value [Fan 1 Tach | fan_cooling | fan | threshold | 3150.000000 RPM]
             443:20100222:180637.592 End of get_value():SUCCEED
             443:20100222:180637.592 In calculate_item_nextcheck (22477,60,"",1266890797)
             443:20100222:180637.593 End calculate_item_nextcheck (result:1266890857)
             443:20100222:180637.593 In DCflush_nextchecks()
             443:20100222:180637.593 End of get_values()
             443:20100222:180637.593 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180637.593 End of DCconfig_get_normal_poller_nextcheck():1266890798
             443:20100222:180637.593 Poller #3 spent 0.064242 seconds while updating   1 values. Sleeping for 1 seconds
             443:20100222:180638.593 In get_values()
             443:20100222:180638.593 In DCinit_nextchecks()
             443:20100222:180638.593 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180638.593 End of DCconfig_get_poller_items():1
             443:20100222:180638.593 In substitute_simple_macros (data:'fan_cooling[2]')
             443:20100222:180638.593 In substitute_simple_macros (data:'node063-lom.cluster.zymeworks.com')
             443:20100222:180638.594 In get_value() key:'fan_cooling[2]'
             443:20100222:180638.594 In get_value_ipmi(key:fan_cooling[2])
             443:20100222:180638.594 In init_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180638.594 In get_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180638.594 In get_ipmi_sensor_by_name() Fan 2 Tach@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180638.594 In read_ipmi_sensor() Fan 2 Tach@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180638.656 In got_thresh_reading()
             443:20100222:180638.656 In get_ipmi_sensor()
             443:20100222:180638.656 Value [Fan 2 Tach | fan_cooling | fan | threshold | 2925.000000 RPM]
             443:20100222:180638.656 End of get_value():SUCCEED
             443:20100222:180638.657 In calculate_item_nextcheck (22478,60,"",1266890798)
             443:20100222:180638.657 End calculate_item_nextcheck (result:1266890858)
             443:20100222:180638.657 In DCflush_nextchecks()
             443:20100222:180638.657 End of get_values()
             443:20100222:180638.657 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180638.657 End of DCconfig_get_normal_poller_nextcheck():1266890799
             443:20100222:180638.657 Poller #3 spent 0.063836 seconds while updating   1 values. Sleeping for 1 seconds
             443:20100222:180639.657 In get_values()
             443:20100222:180639.657 In DCinit_nextchecks()
             443:20100222:180639.657 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180639.657 End of DCconfig_get_poller_items():1
             443:20100222:180639.657 In substitute_simple_macros (data:'fan_cooling[3]')
             443:20100222:180639.657 In substitute_simple_macros (data:'node063-lom.cluster.zymeworks.com')
             443:20100222:180639.658 In get_value() key:'fan_cooling[3]'
             443:20100222:180639.658 In get_value_ipmi(key:fan_cooling[3])
             443:20100222:180639.658 In init_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180639.658 In get_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180639.658 In get_ipmi_sensor_by_name() Fan 3 Tach@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180639.658 In read_ipmi_sensor() Fan 3 Tach@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180639.721 In got_thresh_reading()
             443:20100222:180639.721 In get_ipmi_sensor()
             443:20100222:180639.721 Value [Fan 3 Tach | fan_cooling | fan | threshold | 2925.000000 RPM]
             443:20100222:180639.721 End of get_value():SUCCEED
             443:20100222:180639.721 In calculate_item_nextcheck (22479,60,"",1266890799)
             443:20100222:180639.721 End calculate_item_nextcheck (result:1266890859)
             443:20100222:180639.721 In DCflush_nextchecks()
             443:20100222:180639.721 End of get_values()
             443:20100222:180639.721 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180639.721 End of DCconfig_get_normal_poller_nextcheck():1266890800
             443:20100222:180639.721 Poller #3 spent 0.064072 seconds while updating   1 values. Sleeping for 1 seconds
             443:20100222:180640.721 In get_values()
             443:20100222:180640.722 In DCinit_nextchecks()
             443:20100222:180640.722 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180640.722 End of DCconfig_get_poller_items():1
             443:20100222:180640.722 In substitute_simple_macros (data:'processor[1,temp]')
             443:20100222:180640.722 In substitute_simple_macros (data:'node063-lom.cluster.zymeworks.com')
             443:20100222:180640.722 In get_value() key:'processor[1,temp]'
             443:20100222:180640.722 In get_value_ipmi(key:processor[1,temp])
             443:20100222:180640.722 In init_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180640.722 In get_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180640.722 In get_ipmi_sensor_by_name() CPU 1 Temp@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180640.722 In read_ipmi_sensor() CPU 1 Temp@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180640.785 In got_thresh_reading()
             443:20100222:180640.785 In get_ipmi_sensor()
             443:20100222:180640.785 Value [CPU 1 Temp | processor | temperature | threshold | 20.000000 C]
             443:20100222:180640.785 End of get_value():SUCCEED
             443:20100222:180640.785 In calculate_item_nextcheck (22480,60,"",1266890800)
             443:20100222:180640.785 End calculate_item_nextcheck (result:1266890860)
             443:20100222:180640.785 In DCflush_nextchecks()
             443:20100222:180640.785 End of get_values()
             443:20100222:180640.785 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180640.785 End of DCconfig_get_normal_poller_nextcheck():1266890801
             443:20100222:180640.786 Poller #3 spent 0.063902 seconds while updating   1 values. Sleeping for 1 seconds
             443:20100222:180641.786 In get_values()
             443:20100222:180641.786 In DCinit_nextchecks()
             443:20100222:180641.786 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180641.786 End of DCconfig_get_poller_items():1
             443:20100222:180641.786 In substitute_simple_macros (data:'processor[2,temp]')
             443:20100222:180641.786 In substitute_simple_macros (data:'node063-lom.cluster.zymeworks.com')
             443:20100222:180641.786 In get_value() key:'processor[2,temp]'
             443:20100222:180641.786 In get_value_ipmi(key:processor[2,temp])
             443:20100222:180641.786 In init_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180641.786 In get_ipmi_host([node063-lom.cluster.zymeworks.com]:623)
             443:20100222:180641.786 In get_ipmi_sensor_by_name() CPU 2 Temp@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180641.786 In read_ipmi_sensor() CPU 2 Temp@[node063-lom.cluster.zymeworks.com]:623
             443:20100222:180641.849 In got_thresh_reading()
             443:20100222:180641.849 In get_ipmi_sensor()
             443:20100222:180641.849 Value [CPU 2 Temp | processor | temperature | threshold | 20.000000 C]
             443:20100222:180641.849 End of get_value():SUCCEED
             443:20100222:180641.849 In calculate_item_nextcheck (22481,60,"",1266890801)
             443:20100222:180641.849 End calculate_item_nextcheck (result:1266890861)
             443:20100222:180641.849 In DCflush_nextchecks()
             443:20100222:180641.849 End of get_values()
             443:20100222:180641.849 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180641.850 End of DCconfig_get_normal_poller_nextcheck():1266890826
             443:20100222:180641.850 Poller #3 spent 0.063793 seconds while updating   1 values. Sleeping for 5 seconds
             443:20100222:180646.850 In get_values()
             443:20100222:180646.850 In DCinit_nextchecks()
             443:20100222:180646.850 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180646.850 End of DCconfig_get_poller_items():0
             443:20100222:180646.850 In DCflush_nextchecks()
             443:20100222:180646.850 End of get_values()
             443:20100222:180646.850 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180646.850 End of DCconfig_get_normal_poller_nextcheck():1266890826
             443:20100222:180646.850 Poller #3 spent 0.000423 seconds while updating   0 values. Sleeping for 5 seconds
             443:20100222:180651.850 In get_values()
             443:20100222:180651.851 In DCinit_nextchecks()
             443:20100222:180651.851 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180651.851 End of DCconfig_get_poller_items():0
             443:20100222:180651.851 In DCflush_nextchecks()
             443:20100222:180651.851 End of get_values()
             443:20100222:180651.851 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180651.851 End of DCconfig_get_normal_poller_nextcheck():1266890826
             443:20100222:180651.851 Poller #3 spent 0.000440 seconds while updating   0 values. Sleeping for 5 seconds
             443:20100222:180656.851 In get_values()
             443:20100222:180656.851 In DCinit_nextchecks()
             443:20100222:180656.851 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180656.851 End of DCconfig_get_poller_items():0
             443:20100222:180656.851 In DCflush_nextchecks()
             443:20100222:180656.852 End of get_values()
             443:20100222:180656.852 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180656.852 End of DCconfig_get_normal_poller_nextcheck():1266890826
             443:20100222:180656.852 Poller #3 spent 0.000425 seconds while updating   0 values. Sleeping for 5 seconds
             443:20100222:180701.852 In get_values()
             443:20100222:180701.852 In DCinit_nextchecks()
             443:20100222:180701.852 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180701.852 End of DCconfig_get_poller_items():0
             443:20100222:180701.852 In DCflush_nextchecks()
             443:20100222:180701.852 End of get_values()
             443:20100222:180701.852 In DCconfig_get_normal_poller_nextcheck() poller_type:2 poller_num:3
             443:20100222:180701.852 End of DCconfig_get_normal_poller_nextcheck():1266890826
             443:20100222:180701.852 Poller #3 spent 0.000400 seconds while updating   0 values. Sleeping for 5 seconds
             443:20100222:180706.853 In get_values()
             443:20100222:180706.853 In DCinit_nextchecks()
             443:20100222:180706.853 In DCconfig_get_poller_items() poller_type:2 poller_num:3
             443:20100222:180706.853 End of DCconfig_get_poller_items():1
             443:20100222:180706.853 In substitute_simple_macros (data:'ambient.temperature')
             443:20100222:180706.853 In substitute_simple_macros (data:'node058-lom.cluster.zymeworks.com')
             443:20100222:180706.853 In get_value() key:'ambient.temperature'
             443:20100222:180706.853 In get_value_ipmi(key:ambient.temperature)
             443:20100222:180706.853 In init_ipmi_host([node058-lom.cluster.zymeworks.com]:623)
             443:20100222:180706.853 In get_ipmi_host([node058-lom.cluster.zymeworks.com]:623)
             443:20100222:180706.853 In get_ipmi_sensor_by_name() Ambient Temp@[node058-lom.cluster.zymeworks.com]:623
             443:20100222:180706.853 In read_ipmi_sensor() Ambient Temp@[node058-lom.cluster.zymeworks.com]:623
             443:20100222:180706.854 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
             443:20100222:180706.854 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
             443:20100222:180706.854 In setup_done() [(null)]:657228304
             443:20100222:180706.854 EINF: (12.1).Ambient Temp sensor.c(reading_get_start):Error sending reading get command: 16
             443:20100222:180706.854 In got_thresh_reading()
             443:20100222:180706.854 EINF: (f.f)(m,0) sdr.c(handle_sdr_info): IPMI Error getting SDR info: ff
             443:20100222:180706.854 In sensor_change()
             443:20100222:180706.854 In entity_change()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.855 In delete_ipmi_sensor()
             443:20100222:180706.855 In entity_change()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.855 In delete_ipmi_sensor()
             443:20100222:180706.855 In entity_change()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.855 In delete_ipmi_sensor()
             443:20100222:180706.855 In entity_change()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.855 In delete_ipmi_sensor()
             443:20100222:180706.855 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.856 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.856 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.856 In entity_change()
             443:20100222:180706.856 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.856 In entity_change()
             443:20100222:180706.856 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.856 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.856 In sensor_change()
             443:20100222:180706.856 In delete_ipmi_sensor()
             443:20100222:180706.857 In entity_change()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In delete_ipmi_sensor()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In entity_change()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In delete_ipmi_sensor()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In delete_ipmi_sensor()
             443:20100222:180706.857 In sensor_change()
             443:20100222:180706.857 In entity_change()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In entity_change()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In delete_ipmi_sensor()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In delete_ipmi_sensor()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In entity_change()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In delete_ipmi_sensor()
             443:20100222:180706.858 In entity_change()
             443:20100222:180706.858 In sensor_change()
             443:20100222:180706.858 In delete_ipmi_sensor()
             443:20100222:180706.859 In entity_change()
             443:20100222:180706.859 In sensor_change()
             443:20100222:180706.859 In entity_change()
             443:20100222:180706.859 In sensor_change()
             443:20100222:180706.859 In entity_change()
             443:20100222:180706.859 In entity_change()
             443:20100222:180706.859 In entity_change()
             443:20100222:180706.859 In control_change()
             443:20100222:180706.859 In delete_ipmi_control()

          Comment

          • lukus
            Junior Member
            • May 2008
            • 9

            #6
            *bump*

            It seems that this is caused by flakey IPMI interfaces. Still, this shouldn't cause the Zabbix server to outright crash.

            Is there anything else I can provide in order to help diagnose this issue?

            I can take one of my nodes and put its IPMI interface on a publicly-routable IP for further diagnosis by Zabbix developers if that helps. Then you can query it, turn it on and off, etc.

            Comment

            Working...