Ad Widget

Collapse

SNMPv3 monitoring is flapping

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bziegler
    Junior Member
    • Oct 2019
    • 6

    #1

    SNMPv3 monitoring is flapping

    Hello Zabbix community,

    we are monitoring hundreds of devices with zabbix, but in recent months we started monitoring the bmcs (aka XClarity Controller) of a few new Lenovo ThinkSystem SR650 servers. The bmcs are running snmpv3 daemons. These are the first devices we are monitoring via snmpv3. There was a bug in the bmc firmware that resulted in timeouts of snmp requests. This issue was fixed in a later version. We updated the firmware and the issue became worse, more requests failed. So I opened a support case with Lenovo. They told me to update also the UEFI (and - since we are doing maintenance - update some other firmwares). This was done today. And the issue remains. I did a bit of digging and I am not sure what's exactly wrong - besides zabbix not getting an answer for every request to those devices. I am also not sure if the initial problem is still the issue I can see today.

    We are currently using zabbix 4.4.1, but the issue started at least with version 4.2.x.

    In /var/log/zabbix/zabbix_server.log I could find the following:
    ...
    1075:20191030:155125.205 enabling SNMP agent checks on host "<Host>": host became available
    Error: passphrase chosen is below the length requirements of the USM (min=8).
    1075:20191030:155125.295 SNMP agent item "system.hw.physicaldisk.status[diskHealthStatus.27]" on host "<Host>" failed: first network error, wait for 15 seconds
    1075:20191030:155140.818 resuming SNMP agent checks on host "<Host>": connection restored
    Error: passphrase chosen is below the length requirements of the USM (min=8).
    1075:20191030:155140.941 SNMP agent item "system.hw.physicaldisk.status[diskHealthStatus.27]" on host "<Host>" failed: first network error, wait for 15 seconds
    Error: passphrase chosen is below the length requirements of the USM (min=8).
    1075:20191030:155155.991 SNMP agent item "system.hw.physicaldisk.status[diskHealthStatus.27]" on host "<Host>" failed: another network error, wait for 15 seconds
    Error: passphrase chosen is below the length requirements of the USM (min=8).
    1075:20191030:155210.043 SNMP agent item "system.hw.physicaldisk.status[diskHealthStatus.27]" on host "<Host>" failed: another network error, wait for 15 seconds
    1075:20191030:155225.102 temporarily disabling SNMP agent checks on host "<Host>": host unavailable
    ...
    This is logged over and over again. In the zabbix frontend on hover over the host availability we get:
    Error generating Ku from privacy pass phrase
    This is rather surprising since the passphrases are 8 or more characters long:
    {$SNMP_AUTH_PASS} = <length=8>
    {$SNMP_PRIV_PASS} = <length=16>
    Note that not all request fail and the snmp availability remains on average around 50%. I attached a picture for comparison.
    All request from the monitoring servers commandline using net-snmp-utils to the bmcs works without any issue.

    Does anyone has an idea what could be wrong?

    Thanks and best regards
    Benjamin
    Attached Files
  • bziegler
    Junior Member
    • Oct 2019
    • 6

    #2
    So after some more digging I have resolved this issue. In one item of one discovery rule for the physical disks was the privacy passphrase macro "{$SNMP_PRIV_PASS}" missing. I believe this was only an issue for some snmpget requests that were not done as snmpbulkget.

    The configuration of snmpv3 hosts in zabbix is just terribly complex and unnecessary redundant.

    Comment

    Working...