IBM

International Business Machines (IBM), is a global technology company that provides hardware, software, cloud-based services and cognitive computing.

Available solutions




This template is for Zabbix version: 5.4
Also available for: 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/server/ibm_imm_snmp?at=release/5.4

IBM IMM SNMP

Overview

For Zabbix version: 5.4 and higher
for IMM2 and IMM1 IBM serverX hardware

This template was tested on:

  • IBM System x3550 M2 with IMM1
  • IBM x3250M3 with IMM1
  • IBM x3550M5 with IMM2
  • System x3550 M3 with IMM1

Setup

Refer to the vendor documentation.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$DISK_OK_STATUS}

-

Normal
{$FAN_OK_STATUS}

-

Normal
{$HEALTH_CRIT_STATUS}

-

2
{$HEALTH_DISASTER_STATUS}

-

0
{$HEALTH_WARN_STATUS}

-

4
{$PSU_OK_STATUS}

-

Normal
{$TEMP_CRIT:"Ambient"}

-

35
{$TEMP_CRIT_LOW}

-

5
{$TEMP_CRIT}

-

60
{$TEMP_WARN:"Ambient"}

-

30
{$TEMP_WARN}

-

50

Template links

Name
Generic SNMP

Discovery rules

Name Description Type Key and additional info
Temperature Discovery

Scanning IMM-MIB::tempTable

SNMP tempDescr.discovery

Filter:

AND_OR

- B: {#SNMPVALUE} MATCHES_REGEX (DIMM|PSU|PCH|RAID|RR|PCI).*

Temperature Discovery Ambient

Scanning IMM-MIB::tempTable with Ambient filter

SNMP tempDescr.discovery.ambient

Filter:

AND_OR

- B: {#SNMPVALUE} MATCHES_REGEX Ambient.*

Temperature Discovery CPU

Scanning IMM-MIB::tempTable with CPU filter

SNMP tempDescr.discovery.cpu

Filter:

AND_OR

- B: {#SNMPVALUE} MATCHES_REGEX CPU [0-9]* Temp

PSU Discovery

IMM-MIB::powerFruName

SNMP psu.discovery
FAN Discovery

IMM-MIB::fanDescr

SNMP fan.discovery
Physical Disk Discovery

-

SNMP physicalDisk.discovery

Items collected

Group Name Description Type Key and additional info
Fans {#FAN_DESCR}: Fan status

MIB: IMM-MIB

A description of the fan component status.

SNMP sensor.fan.status[fanHealthStatus.{#SNMPINDEX}]
Fans {#FAN_DESCR}: Fan speed, %

MIB: IMM-MIB

Fan speed expressed in percent(%) of maximum RPM.

An octet string expressed as 'ddd% of maximum' where:d is a decimal digit or blank space for a leading zero.

If the fan is determined not to be running or the fan speed cannot be determined, the string will indicate 'Offline'.

SNMP sensor.fan.speed.percentage[fanSpeed.{#SNMPINDEX}]

Preprocessing:

- REGEX: (\d{1,3}) *%( of maximum)? \1

Inventory Hardware model name

MIB: IMM-MIB

SNMP system.hw.model

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 1d

Inventory Hardware serial number

MIB: IMM-MIB

Machine serial number VPD information

SNMP system.hw.serialnumber

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 1d

Physical_disks {#SNMPINDEX}: Physical disk status

MIB: IMM-MIB

SNMP system.hw.physicaldisk.status[diskHealthStatus.{#SNMPINDEX}]
Physical_disks {#SNMPINDEX}: Physical disk part number

MIB: IMM-MIB

disk module FRU name.

SNMP system.hw.physicaldisk.part_number[diskFruName.{#SNMPINDEX}]
Power_supply {#PSU_DESCR}: Power supply status

MIB: IMM-MIB

A description of the power module status.

SNMP sensor.psu.status[powerHealthStatus.{#SNMPINDEX}]
Status Overall system health status

MIB: IMM-MIB

Indicates status of system health for the system in which the IMM resides. Value of 'nonRecoverable' indicates a severe error has occurred and the system may not be functioning. A value of 'critical' indicates that a error has occurred but the system is currently functioning properly. A value of 'nonCritical' indicates that a condition has occurred that may change the state of the system in the future but currently the system is working properly. A value of 'normal' indicates that the system is operating normally.

SNMP system.status[systemHealthStat.0]
Temperature {#SNMPVALUE}: Temperature

MIB: IMM-MIB

Temperature readings of testpoint: {#SNMPVALUE}

SNMP sensor.temp.value[tempReading.{#SNMPINDEX}]
Temperature Ambient: Temperature

MIB: IMM-MIB

Temperature readings of testpoint: Ambient

SNMP sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}]
Temperature CPU: Temperature

MIB: IMM-MIB

Temperature readings of testpoint: CPU

SNMP sensor.temp.value[tempReading.CPU.{#SNMPINDEX}]

Triggers

Name Description Expression Severity Dependencies and additional info
{#FAN_DESCR}: Fan is not in normal state

Please check the fan unit

{TEMPLATE_NAME:sensor.fan.status[fanHealthStatus.{#SNMPINDEX}].count(#1,{$FAN_OK_STATUS},ne)}=1 INFO
Device has been replaced (new serial number received)

Device serial number has changed. Ack to close

{TEMPLATE_NAME:system.hw.serialnumber.diff()}=1 and {TEMPLATE_NAME:system.hw.serialnumber.strlen()}>0 INFO

Manual close: YES

{#SNMPINDEX}: Physical disk is not in OK state

Please check physical disk for warnings or errors

{TEMPLATE_NAME:system.hw.physicaldisk.status[diskHealthStatus.{#SNMPINDEX}].count(#1,{$DISK_OK_STATUS},ne)}=1 WARNING
{#PSU_DESCR}: Power supply is not in normal state

Please check the power supply unit for errors

{TEMPLATE_NAME:sensor.psu.status[powerHealthStatus.{#SNMPINDEX}].count(#1,{$PSU_OK_STATUS},ne)}=1 INFO
System is in unrecoverable state!

Please check the device for faults

{TEMPLATE_NAME:system.status[systemHealthStat.0].count(#1,{$HEALTH_DISASTER_STATUS},eq)}=1 HIGH
System status is in critical state

Please check the device for errors

{TEMPLATE_NAME:system.status[systemHealthStat.0].count(#1,{$HEALTH_CRIT_STATUS},eq)}=1 HIGH

Depends on:

- System is in unrecoverable state!

System status is in warning state

Please check the device for warnings

{TEMPLATE_NAME:system.status[systemHealthStat.0].count(#1,{$HEALTH_WARN_STATUS},eq)}=1 WARNING

Depends on:

- System is in unrecoverable state!

- System status is in critical state

{#SNMPVALUE}: Temperature is above warning threshold: >{$TEMP_WARN:""}

This trigger uses temperature sensor values as well as temperature sensor status if available

{TEMPLATE_NAME:sensor.temp.value[tempReading.{#SNMPINDEX}].avg(5m)}>{$TEMP_WARN:""}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.{#SNMPINDEX}].max(5m)}<{$TEMP_WARN:""}-3
WARNING

Depends on:

- {#SNMPVALUE}: Temperature is above critical threshold: >{$TEMP_CRIT:""}

{#SNMPVALUE}: Temperature is above critical threshold: >{$TEMP_CRIT:""}

This trigger uses temperature sensor values as well as temperature sensor status if available

{TEMPLATE_NAME:sensor.temp.value[tempReading.{#SNMPINDEX}].avg(5m)}>{$TEMP_CRIT:""}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.{#SNMPINDEX}].max(5m)}<{$TEMP_CRIT:""}-3
HIGH
{#SNMPVALUE}: Temperature is too low: <{$TEMP_CRIT_LOW:""}

-

{TEMPLATE_NAME:sensor.temp.value[tempReading.{#SNMPINDEX}].avg(5m)}<{$TEMP_CRIT_LOW:""}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.{#SNMPINDEX}].min(5m)}>{$TEMP_CRIT_LOW:""}+3
AVERAGE
Ambient: Temperature is above warning threshold: >{$TEMP_WARN:"Ambient"}

This trigger uses temperature sensor values as well as temperature sensor status if available

{TEMPLATE_NAME:sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}].avg(5m)}>{$TEMP_WARN:"Ambient"}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}].max(5m)}<{$TEMP_WARN:"Ambient"}-3
WARNING

Depends on:

- Ambient: Temperature is above critical threshold: >{$TEMP_CRIT:"Ambient"}

Ambient: Temperature is above critical threshold: >{$TEMP_CRIT:"Ambient"}

This trigger uses temperature sensor values as well as temperature sensor status if available

{TEMPLATE_NAME:sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}].avg(5m)}>{$TEMP_CRIT:"Ambient"}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}].max(5m)}<{$TEMP_CRIT:"Ambient"}-3
HIGH
Ambient: Temperature is too low: <{$TEMP_CRIT_LOW:"Ambient"}

-

{TEMPLATE_NAME:sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}].avg(5m)}<{$TEMP_CRIT_LOW:"Ambient"}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.Ambient.{#SNMPINDEX}].min(5m)}>{$TEMP_CRIT_LOW:"Ambient"}+3
AVERAGE
CPU: Temperature is above warning threshold: >{$TEMP_WARN:"CPU"}

This trigger uses temperature sensor values as well as temperature sensor status if available

{TEMPLATE_NAME:sensor.temp.value[tempReading.CPU.{#SNMPINDEX}].avg(5m)}>{$TEMP_WARN:"CPU"}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.CPU.{#SNMPINDEX}].max(5m)}<{$TEMP_WARN:"CPU"}-3
WARNING

Depends on:

- CPU: Temperature is above critical threshold: >{$TEMP_CRIT:"CPU"}

CPU: Temperature is above critical threshold: >{$TEMP_CRIT:"CPU"}

This trigger uses temperature sensor values as well as temperature sensor status if available

{TEMPLATE_NAME:sensor.temp.value[tempReading.CPU.{#SNMPINDEX}].avg(5m)}>{$TEMP_CRIT:"CPU"}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.CPU.{#SNMPINDEX}].max(5m)}<{$TEMP_CRIT:"CPU"}-3
HIGH
CPU: Temperature is too low: <{$TEMP_CRIT_LOW:"CPU"}

-

{TEMPLATE_NAME:sensor.temp.value[tempReading.CPU.{#SNMPINDEX}].avg(5m)}<{$TEMP_CRIT_LOW:"CPU"}

Recovery expression:

{TEMPLATE_NAME:sensor.temp.value[tempReading.CPU.{#SNMPINDEX}].min(5m)}>{$TEMP_CRIT_LOW:"CPU"}+3
AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

Known Issues

  • Description: Some IMMs (IMM1) do not return disks

    • Version: IMM1
    • Device: IBM x3250M3
  • Description: Some IMMs (IMM1) do not return fan status: fanHealthStatus

    • Version: IMM1
    • Device: IBM x3250M3
  • Description: IMM1 servers (M2, M3 generations) sysObjectID is NET-SNMP-MIB::netSnmpAgentOIDs.10

    • Version: IMM1
    • Device: IMM1 servers (M2,M3 generations)
  • Description: IMM1 servers (M2, M3 generations) only Ambient temperature sensor available

    • Version: IMM1
    • Device: IMM1 servers (M2,M3 generations)

Articles and documentation

+ Propose new article

Didn't find what you are looking for?