Ad Widget

Collapse

Issue with zabbix internal host availability checks

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • raulk89
    Junior Member
    • Nov 2023
    • 7

    #1

    Issue with zabbix internal host availability checks

    Hi

    Zabbix server 6.0.29
    Rocky linux 8
    Using snmp v3

    I have an issue with zabbix snmp checks.
    I am using "Generic by SNMP" template for dell hardware on nutanix.

    There is an item "Generic SNMP: SNMP agent availability" where there is zabbix internal check "zabbix[host,snmp,available]"

    The issue is, whenever I patch the software on that host, there will be network errors on host (probably because, the host is not available for the brief period of time).
    But that is not the issue here. The main issue is the last row of this log.

    Code:
     98218:20240603:184913.462 temporarily disabling SNMP agent checks on host "host1.domain.com": interface unavailable
    I have confirmed, from zabbix_server.conf file, there is UnavailableDelay=60 (so default), so it should start working after 60 seconds of delay after such message.
    And it will never be available again (at least not before I restart the zabbix-server systemd service).
    There will be no rows written to zabbix_server.log file after such row for that host. Even when I manually initate a check on that host - nothing appears there.
    Why is that ?
    The error I get from that host, well "SNMP "Not Available", and "Timeout while connecting to "ip:161"

    But I can confirm, there is no timeout. I have tried snmpwalk from zabbix server command line and I can retrieve the items just fine. Also, I can check snmp items via zabbix GUI as well - there is no issue.

    Code:
    [root@ee02-zabbix ~]# cat /var/log/zabbix/zabbix_server.log | grep host1.domain.com
     98172:20240602:044927.500 SNMP agent item "citAvgLatencyUsecs[NutanixManagementShare.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240602:044942.047 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98214:20240602:170227.450 SNMP agent item "citAvgLatencyUsecs[HYCU-cd12ff48-ecce-448f-9a57-f853483b9f7f.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240602:170242.075 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98190:20240602:172627.174 SNMP agent item "citIOPerSecond[HYCU-9a37afc8-92f2-4f21-82e3-f74193258c89.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240602:172642.392 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98202:20240602:173727.537 SNMP agent item "dstAverageLatency[2]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98219:20240602:173742.160 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98172:20240603:120557.571 SNMP agent item "system.net.uptime[sysUpTime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:120612.312 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98213:20240603:151327.299 SNMP agent item "citAvgLatencyUsecs[NTNX_d1-res-nas_ctr.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:151342.066 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98178:20240603:153727.710 SNMP agent item "citAvgLatencyUsecs[HYCU-525e34df-7503-4f03-b18f-cc1e22b38f96.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:153742.371 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98210:20240603:163227.550 SNMP agent item "hypervisorAverageLatency[2]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:163242.207 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98209:20240603:164927.490 SNMP agent item "dstNumFreeBytes[28]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:164942.085 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98195:20240603:170027.134 SNMP agent item "dstIOBandwidth[22]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:170042.067 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98211:20240603:170131.946 SNMP agent item "citIOPerSecond[default-container-55284636057635.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:170150.138 SNMP agent item "citIOPerSecond[default-container-55284636057635.]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98216:20240603:170207.029 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98162:20240603:170207.442 item "host1.domain.com:citIOPerSecond[default-container-55284636057635.]" became not supported: Value of type "string" is not suitable for value type "Numeric (unsigned)". Value "NULL"
     98178:20240603:170327.844 SNMP agent item "dstNumberIops[22]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:170346.125 SNMP agent item "dstNumberIops[11]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98216:20240603:170405.150 SNMP agent item "hypervisorIOBandwidth[2]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98216:20240603:170424.179 temporarily disabling SNMP agent checks on host "host1.domain.com": interface unavailable
     98217:20240603:171048.618 enabling SNMP agent checks on host "host1.domain.com": interface became available
     98163:20240603:171123.604 item "host1.domain.com:citIOPerSecond[default-container-55284636057635.]" became supported
     98176:20240603:171331.918 SNMP agent item "dstAverageLatency[33]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:171350.829 SNMP agent item "dstAverageLatency[33]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98216:20240603:171409.851 SNMP agent item "dstAverageLatency[33]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98217:20240603:171428.865 temporarily disabling SNMP agent checks on host "host1.domain.com": interface unavailable
     98217:20240603:172432.621 enabling SNMP agent checks on host "host1.domain.com": interface became available
     98177:20240603:172531.747 SNMP agent item "citAvgLatencyUsecs[NTNX_d1-res-nas_ctr.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:172546.920 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98201:20240603:173043.761 SNMP agent item "hypervisorCpuUsagePercent[3]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:173058.273 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98205:20240603:173131.976 SNMP agent item "dstAverageLatency[5]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:173146.360 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98209:20240603:173411.722 SNMP agent item "system.net.uptime[sysUpTime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:173426.828 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98172:20240603:173751.377 SNMP agent item "dstAverageLatency[12]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240603:173810.255 SNMP agent item "dstAverageLatency[12]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98217:20240603:173829.282 SNMP agent item "dstNumberIops[23]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98217:20240603:173848.297 temporarily disabling SNMP agent checks on host "host1.domain.com": interface unavailable
     98215:20240603:174836.862 enabling SNMP agent checks on host "host1.domain.com": interface became available
     98199:20240603:175127.659 SNMP agent item "dstAverageLatency[8]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:175142.302 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98196:20240603:175511.244 SNMP agent item "citIOPerSecond[NutanixManagementShare.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:175526.548 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98204:20240603:180051.722 SNMP agent item "hypervisorTxBytes[3]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:180106.932 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98213:20240603:180131.971 SNMP agent item "hypervisorIOBandwidth[2]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:180146.032 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98194:20240603:180552.106 SNMP agent item "system.net.uptime[sysUpTime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240603:180607.376 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98201:20240603:180701.139 SNMP agent item "system.hw.uptime[hrSystemUptime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98219:20240603:180716.530 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98209:20240603:180731.968 SNMP agent item "citAvgLatencyUsecs[SelfServiceContainer.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240603:180746.559 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98210:20240603:181012.170 SNMP agent item "system.net.uptime[sysUpTime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:181027.849 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98178:20240603:181511.205 SNMP agent item "hypervisorTxDropCount[2]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:181526.347 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98190:20240603:182511.555 SNMP agent item "dstState[15]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:182526.430 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98203:20240603:183101.606 SNMP agent item "system.hw.uptime[hrSystemUptime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240603:183116.842 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98203:20240603:183131.645 SNMP agent item "dstAverageLatency[1]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240603:183146.869 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98202:20240603:183511.358 SNMP agent item "dstNumFreeBytes[12]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98216:20240603:183526.396 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98177:20240603:183631.600 SNMP agent item "system.hw.uptime[hrSystemUptime.0]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98218:20240603:183646.510 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98204:20240603:183731.768 SNMP agent item "dstAverageLatency[8]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98219:20240603:183746.757 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98203:20240603:184351.597 SNMP agent item "dstAverageLatency[13]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:184406.109 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98172:20240603:184511.589 SNMP agent item "dstNumberIops[16]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98217:20240603:184526.188 resuming SNMP agent checks on host "host1.domain.com": connection restored
     98195:20240603:184827.582 SNMP agent item "citAvgLatencyUsecs[HYCU-cd12ff48-ecce-448f-9a57-f853483b9f7f.]" on host "host1.domain.com" failed: first network error, wait for 15 seconds
     98215:20240603:184846.424 SNMP agent item "citAvgLatencyUsecs[HYCU-cd12ff48-ecce-448f-9a57-f853483b9f7f.]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98218:20240603:184850.429 SNMP agent item "dstIOBandwidth[17]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98215:20240603:184909.453 SNMP agent item "dstAverageLatency[7]" on host "host1.domain.com" failed: another network error, wait for 15 seconds
     98218:20240603:184913.462 temporarily disabling SNMP agent checks on host "host1.domain.com": interface unavailable
    Last edited by raulk89; 04-06-2024, 08:46.
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1782

    #2
    Whenever there are SNMP v3 issues with Zabbix, these come to mind:



    If monitoring SNMPv3 devices, make sure that msgAuthoritativeEngineID (also known as snmpEngineID or "Engine ID") is never shared by two devices. According to RFC 2571 (section 3.1.1.1) it must be unique for each device.
    RFC3414 requires the SNMPv3 devices to persist their engineBoots. Some devices do not do that, which results in their SNMP messages being discarded as outdated after being restarted. In such situation, SNMP cache needs to be manually cleared on a server/proxy (by using -R snmp_cache_reload) or the server/proxy needs to be restarted.​
    Markku

    Comment

    • sambhu.prakash
      Junior Member
      • Apr 2021
      • 20

      #3
      I have the same problem. I am on Rocky Linux 9 and Zabbix 7 version. Is there a fix for this other than reloading the snmp_cache or restarting Zabbix server?

      Comment

      • raulk89
        Junior Member
        • Nov 2023
        • 7

        #4
        Not that I know of. Indeed, this is very-very bad problem - due to this problem, at the moment, there is no way of knowing if your infrastructure is working or not (zabbix-server needs regular service restarts for that)

        I have created the issue here as well 3 months ago: [ZBX-25408] After SNMP interface unavailable, it will never start working again (only zabbix-server restart resolves the issue) - ZABBIX SUPPORT
        No one bothered to look into it as well

        Regards
        Raul

        Comment

        Working...