Ad Widget

Collapse

SNMP polls being missed

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mrgadgetnz
    Junior Member
    • Jul 2011
    • 11

    #1

    SNMP polls being missed

    We have nodata triggers on a device uptime item (5m interval) for each SNMP device we monitor. Recently, a handful of of these triggers have started firing and closer inspection reveals that these items are failing to get data - sometimes for extended periods of time. See graph attached/below.

    Click image for larger version  Name:	zbx_uptime_graph.JPG Views:	3 Size:	78.8 KB ID:	379464

    I can snmpget this OID from the CLI (or even snmpwalk the target host) during one of these 'failure' periods without issue.

    Pollers set to 150, and stay 10-30% utilised typically, peaking around 50% infrequently.

    Network utilisation on NIC is negligible.

    Network connectivity of devices is excellent (gigabit [or more] everywhere).

    Any ideas?

    We have a single-box deployment on a dual socket HP DL360 Gen9 Server with all-SSD and 128GB of RAM.
    * CentOS 7 (up to date).
    * Zabbix 3.4.15
    * net-snmp 5.7.2-28

    Click image for larger version  Name:	zbx_status.JPG Views:	2 Size:	35.9 KB ID:	379461
  • mrgadgetnz
    Junior Member
    • Jul 2011
    • 11

    #2
    As an update to this - we have turned off bulk on some of these hosts and this has fixed the issue. An unfortunately resolution though, as it means values don't always align in graphs.

    The devices in question all support bulk queries, and the really odd thing is each time we restart Zabbix, a different device will experience issues. eg. deviceA might be missing data, and deviceB is fine - restart zabbix, and it's reverse, deviceA is now fine, and deviceB is missing data. This leads us to point the finger back at Zabbix rather than device.

    The server is not at all busy.

    Comment

    • Wolfsbane2k
      Member
      • Nov 2022
      • 48

      #3
      Originally posted by mrgadgetnz
      As an update to this - we have turned off bulk on some of these hosts and this has fixed the issue. An unfortunately resolution though, as it means values don't always align in graphs.

      The devices in question all support bulk queries, and the really odd thing is each time we restart Zabbix, a different device will experience issues. eg. deviceA might be missing data, and deviceB is fine - restart zabbix, and it's reverse, deviceA is now fine, and deviceB is missing data. This leads us to point the finger back at Zabbix rather than device.

      The server is not at all busy.
      Did you ever get to the bottom of this in any other way?
      We're suffering the same on a specific set of items on a series of hosts using the same template that are meant to be monitored every 1 minute, but are only being polled every 8 hours

      Comment

      • mrgadgetnz
        Junior Member
        • Jul 2011
        • 11

        #4
        Unfortunately not. We are now Zabbix 5.4 on RHEL8, issue still exists to varying degrees.

        What version of SNMP are you using? We seem to have more issues with SNMPv3 - which is unfortunately what we're exclusively using now!

        Comment

        • markosa
          Senior Member
          Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
          • Aug 2022
          • 104

          #5
          Have you thought about having zabbix-proxy and separate frontend server?

          Comment

          Working...