I went down a rabbit hole and I'm very confused. This is on Zabbix 6.4.4.Yes, I know I need to upgrade, we will be rolling to the latest 6.4 release soon.
I am monitoring about 20 switches of the same family via SNMP3. I am monitoring them via a proxy. Two of them frequently have errors like this in the log:
SNMP agent item "system.cpu.util" on host "hostname" failed: first network error, wait for 15 seconds
However, they consistently show CPU utilization info that appears accurate and current. Data is being gathered from several templates, the CPU utilization is being gathered from "Template Module HOST-RESOURCES-MIB CPU SNMP".
(Sometimes, they also throw an error on the system uptime check, but my guess is that this is due to the 15 second delay as this always happens right after two CPU utilization check failures and our SNMP timeout is set to 30 seconds. Probably irrelevant, but I am mentioning it just in case)
While trying to figure this out, I went to use snmpget to see if there was a problem with the item/OID, and discovered that the system.cpu.util key is based on CPU discovery (discovery[{#CPU.UTIL},1.3.6.1.2.1.25.3.3.1.2]).... but there is no CPU discovery in any of my templates, and it looks like this is something generally handled by the Zabbix agent which these switches don't have.
So, here's my questions:
1) If we are just using SNMP, and no agent, how is CPU discovery being executed? I am unable to get a return from any of these switches using snmpget against 1.3.6.1.2.1.25.3.3.1.2 or hrProcessorLoad, but the utilization numbers in the dashboard match UCD-SNMP-MIB::ssCpuUser which is not present in the template
2) Is there a way to get CPU discovery via SNMP to populate this key? I can just replace it with UCD-SNMP-MIB::ssCpuUser (this returns correct values for each thread) but I would rather rely on discovery if I can get it working properly.
3) Why would I be getting errors from just two of these switches out of our entire system? They are all in the same subnet, same software version, same SNMP credentials, and the SNMPv3 EngineIDs have been changed to unique values from the defaults. No issues appear to be present inside Zabbix, this is just in the log.
zz0.lv73udo7qizz
I am monitoring about 20 switches of the same family via SNMP3. I am monitoring them via a proxy. Two of them frequently have errors like this in the log:
SNMP agent item "system.cpu.util" on host "hostname" failed: first network error, wait for 15 seconds
However, they consistently show CPU utilization info that appears accurate and current. Data is being gathered from several templates, the CPU utilization is being gathered from "Template Module HOST-RESOURCES-MIB CPU SNMP".
(Sometimes, they also throw an error on the system uptime check, but my guess is that this is due to the 15 second delay as this always happens right after two CPU utilization check failures and our SNMP timeout is set to 30 seconds. Probably irrelevant, but I am mentioning it just in case)
While trying to figure this out, I went to use snmpget to see if there was a problem with the item/OID, and discovered that the system.cpu.util key is based on CPU discovery (discovery[{#CPU.UTIL},1.3.6.1.2.1.25.3.3.1.2]).... but there is no CPU discovery in any of my templates, and it looks like this is something generally handled by the Zabbix agent which these switches don't have.
So, here's my questions:
1) If we are just using SNMP, and no agent, how is CPU discovery being executed? I am unable to get a return from any of these switches using snmpget against 1.3.6.1.2.1.25.3.3.1.2 or hrProcessorLoad, but the utilization numbers in the dashboard match UCD-SNMP-MIB::ssCpuUser which is not present in the template
2) Is there a way to get CPU discovery via SNMP to populate this key? I can just replace it with UCD-SNMP-MIB::ssCpuUser (this returns correct values for each thread) but I would rather rely on discovery if I can get it working properly.
3) Why would I be getting errors from just two of these switches out of our entire system? They are all in the same subnet, same software version, same SNMP credentials, and the SNMPv3 EngineIDs have been changed to unique values from the defaults. No issues appear to be present inside Zabbix, this is just in the log.
zz0.lv73udo7qizz
Comment