Ad Widget

Collapse

Gaps on the graphs, SNMPv3

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Jason
    Senior Member
    • Nov 2007
    • 430

    #46
    We've also been having issues using SNMPV3 from proxies where have more than a handful of devices being checked. We use SHA/DES and once we get to more than around 5-10 devices being probed from a proxy (Centos 7 - with net-snmp) then we start to see gaps in the data and large number of items stop returning data. We've tried playing with the number of pollers, but that doesn't seem to make any difference. snmp(bulk)walk seems to work fine every time. The log file just starts filling up with loads of timeouts and then devices going unreachable for a few minutes. Once we move a couple of devices off v3 then it all works flawlessly again. We're currently running 3.2 but planning upgrade to 4.0 in the not too distant future.
    Switching to snmp 2c immediately solves the problem, but we'd like to use the more secure v3.

    Comment

    • Jason
      Senior Member
      • Nov 2007
      • 430

      #47
      Looking at this in more detail...
      I've recompiled the poller putting more logging on snmp.
      I can see that net-snmp is being called with the right parameters. It's got the right privacy and auth protocols. Net-snmp is returning -24 error code, which is network timeout.
      The timeout in the poller has been set at various values up to 30 seconds and in all of my tests the snmp data is returning in a few seconds.

      Comment

      • adaker
        Junior Member
        • Feb 2020
        • 3

        #48
        I know this is an old thread, but we are still experiencing this same issue in 4.4.5 with net-snmp 5.7.3 on Debian 10. Has anyone been able to resolve this issue while maintaining SNMPv3 with AES and SHA?

        Comment

        • iand999
          Junior Member
          • May 2020
          • 4

          #49
          I'm seeing something similar to this on 4.4.7 (compiled myself) with SNMPv1 (don't ask), on Fedora 31 with net-snmp 5.8-11 (stock fc31).

          Zabbix is monitoring a Cisco 2921 interfaces via SNMP and interface discovery is enabled, with interval of 10mins for rediscovery.

          A particular interface I'm interested in is also monitored by MRTG and Cacti.

          Whenever the interface gets a lot of traffic flowing thru it consistently (about 50% of its bandwidth; we have 50% rate limiting set on
          this type of traffic), the Zabbix samples of the interface become less frequent (set to 5 mins), and there are therefore gaps in
          the graphs.

          I noticed this in periods were we have a large data transfer running for several hours, causing 50% utilization of the link.
          During those several hours there would only be a small number of samples saved of ifInOctets, nowhere near the regular
          5 minute interval set up.

          The MRTG and Cacti graphs show no problems in this period, only Zabbix does. I was hoping to retire MRTG and Cacti
          until I saw that Zabbix wasn't behaving in this area.

          I've done extensive tracing using tcpdump and found that there is no issue with the router responding
          to the SNMP bulk request for interface data; the OID for the interface is included in the results every time; its just
          that Zabbix is silently not logging it for some reason.

          An snmpget of the exact same data that Zabbix is fetching in the bulk request (81 items) is reliably returning
          results for the interface of interest, in this busy period.

          I first thought this may be related to UDP fragmentation issues and firewalling as the bulk request is large
          and the response is split over 2 packets, but wireshark reassembles them and shows all to be correct.

          I've enabled DebugLevel=4 logging on the Zabbix server where the monitoring is done from, and
          nothing obvious stands out in the logs, although I'm suspecting there may be some interference
          with the rediscovery process and the regular polling.


          Below are are some samples from Latest Data taken during one of these periods (I had the sample interval set to 10 minutes back then).
          (these are ifInOctets with a 'changes per second' preprocessing filter set)

          2020-05-25 03:30:40 175627
          2020-05-25 03:20:40 168993
          2020-05-25 03:10:40 160383
          2020-05-25 03:00:40 254063 <-- note that samples have returned to every 10 minutes again
          2020-05-25 02:30:40 27974 <-- traffic returned to low level somewhere before here
          2020-05-25 02:00:40 6819605
          2020-05-25 00:00:40 6399868
          2020-05-24 22:30:40 6326711
          2020-05-24 21:00:40 6317942
          2020-05-24 19:40:40 6323955
          2020-05-24 18:10:40 6319364
          2020-05-24 16:50:40 6317775
          2020-05-24 15:10:40 6396247
          2020-05-24 13:40:40 6321557
          2020-05-24 12:20:40 6323118
          2020-05-24 10:50:40 6250206
          2020-05-24 09:30:40 6322152 <-- here is where things start getting lost
          2020-05-24 08:00:40 432232 <-- traffic ramps up here
          2020-05-24 07:50:40 29804
          2020-05-24 07:40:40 27692
          2020-05-24 07:30:40 26395
          2020-05-24 07:20:40 105373
          2020-05-24 07:10:40 106862
          2020-05-24 07:00:40 226586

          As mentioned I'm using SNMPv1 for this, but on reading this discussion the issue seems to not be dependent
          on which SNMP version is used.

          Comment

          Working...