Good day, I have been working with HPE to address the issue of why zabbix SNMP monitoring of HPE iLO fails after some time.
When it works, it works great. However, after about 12-24 hours, the HPE iLO stops responding to SNMP requests.
I opened a case with HPE, and submitted logs, wireshark captures, and examples of what we are doing...showing them that it was NOT related to zabbix, but that regular CLI SNMP queries were eventually failing as well.
I did finally get a response. I hope this helps some others who are seeing the same issue. Current iLO firmware version is 3.10. 3.11 is due out (maybe) in April. I presume that late Q2 or early Q3 we can expect 3.12 that has these iLO memory issues resolved.
=-=-=-=-=
The L3 Engineering review is completed, and the resolution was shared as below.
[L3 Problem Description]: SNMP v3 monitoring via Zabbix stops responding within a time frame of several hours to several days. The only temporary workaround is the reset the iLO, but eventually, the issue will return.
[L3 Solution Description]: The Issue was found to be related to a bug in memory management on the iLO, and a fix will be provided in the 3.12 ilo5 FW version.
=-=-=-=-=
In the mean time, I have switched to using the iLO by HTTP template, which uses the Redfish architecture. I don't believe this template collects as many metrics by default, but it's pretty good.
-Tom
When it works, it works great. However, after about 12-24 hours, the HPE iLO stops responding to SNMP requests.
I opened a case with HPE, and submitted logs, wireshark captures, and examples of what we are doing...showing them that it was NOT related to zabbix, but that regular CLI SNMP queries were eventually failing as well.
I did finally get a response. I hope this helps some others who are seeing the same issue. Current iLO firmware version is 3.10. 3.11 is due out (maybe) in April. I presume that late Q2 or early Q3 we can expect 3.12 that has these iLO memory issues resolved.
=-=-=-=-=
The L3 Engineering review is completed, and the resolution was shared as below.
[L3 Problem Description]: SNMP v3 monitoring via Zabbix stops responding within a time frame of several hours to several days. The only temporary workaround is the reset the iLO, but eventually, the issue will return.
[L3 Solution Description]: The Issue was found to be related to a bug in memory management on the iLO, and a fix will be provided in the 3.12 ilo5 FW version.
=-=-=-=-=
In the mean time, I have switched to using the iLO by HTTP template, which uses the Redfish architecture. I don't believe this template collects as many metrics by default, but it's pretty good.
-Tom
Comment