Hello,
Yesterday i set up IPMI the first time.
Added a single item, everything was fine. Then added a total of five sensors and all IPMI polls started to fail.
Today i recognized that i also made an item for the second CPU, which is not present in most of our systems. As soon i manually disable the item for CPU2, it starts to work again as expected.
Also tried the most current version of OpenIPMI because i first thought this is the problem.
But looks like Zabbix doesn't handle devices correct if they are not present, and makes all other IPMI checks fail because of that.
F.ex.
CPU1:
CPU2: (not present):
As soon the check of the second CPU is enabled i get these log messages:
In the frontend i get:
error 0x10000d5 while reading threshold sensor
The expected behavior would be to simply disable this single item and not all IPMI checks.
Thank you
Urs
Yesterday i set up IPMI the first time.
Added a single item, everything was fine. Then added a total of five sensors and all IPMI polls started to fail.
Today i recognized that i also made an item for the second CPU, which is not present in most of our systems. As soon i manually disable the item for CPU2, it starts to work again as expected.
Also tried the most current version of OpenIPMI because i first thought this is the problem.
But looks like Zabbix doesn't handle devices correct if they are not present, and makes all other IPMI checks fail because of that.
F.ex.
CPU1:
Code:
Sensor ID : CPU1 (0x4) Entity ID : 3.0 Sensor Type (Analog) : Temperature Sensor Reading : 42 (+/- 0) degrees C Status : ok Lower Non-Recoverable : na Lower Critical : na Lower Non-Critical : na Upper Non-Critical : 93.000 Upper Critical : 97.000 Upper Non-Recoverable : na Assertion Events : Assertions Enabled : unc+ ucr+ Deassertions Enabled : unc+ ucr+
Code:
Sensor ID : CPU2 (0x5) Entity ID : 3.1 Sensor Type (Analog) : Temperature Sensor Reading : Unable to read sensor: Device Not Present Event Status : Unavailable Assertions Enabled : unc+ ucr+ Deassertions Enabled : unc+ ucr+
Code:
IPMI item [temp.cpu2] on host [ABCDEFGHIJK] failed: first network error, wait for 15 seconds IPMI item [temp.cpu2] on host [ABCDEFGHIJK] failed: another network error, wait for 15 seconds IPMI item [temp.cpu2] on host [ABCDEFGHIJK] failed: another network error, wait for 15 seconds temporarily disabling IPMI checks on host [ABCDEFGHIJK]: host unavailable (~ 2min later) resuming IPMI checks on host [ABCDEFGHIJK]: connection restored
error 0x10000d5 while reading threshold sensor
The expected behavior would be to simply disable this single item and not all IPMI checks.
Thank you
Urs