Ad Widget

**Jason** · 06-11-2018, 13:21

We've also been having issues using SNMPV3 from proxies where have more than a handful of devices being checked. We use SHA/DES and once we get to more than around 5-10 devices being probed from a proxy (Centos 7 - with net-snmp) then we start to see gaps in the data and large number of items stop returning data. We've tried playing with the number of pollers, but that doesn't seem to make any difference. snmp(bulk)walk seems to work fine every time. The log file just starts filling up with loads of timeouts and then devices going unreachable for a few minutes. Once we move a couple of devices off v3 then it all works flawlessly again. We're currently running 3.2 but planning upgrade to 4.0 in the not too distant future.
Switching to snmp 2c immediately solves the problem, but we'd like to use the more secure v3.

**Jason** · 07-11-2018, 15:47

Looking at this in more detail...
I've recompiled the poller putting more logging on snmp.
I can see that net-snmp is being called with the right parameters. It's got the right privacy and auth protocols. Net-snmp is returning -24 error code, which is network timeout.
The timeout in the poller has been set at various values up to 30 seconds and in all of my tests the snmp data is returning in a few seconds.

**adaker** · 23-02-2020, 05:17

I know this is an old thread, but we are still experiencing this same issue in 4.4.5 with net-snmp 5.7.3 on Debian 10. Has anyone been able to resolve this issue while maintaining SNMPv3 with AES and SHA?

**iand999** · 26-05-2020, 06:26

I'm seeing something similar to this on 4.4.7 (compiled myself) with SNMPv1 (don't ask), on Fedora 31 with net-snmp 5.8-11 (stock fc31).

Zabbix is monitoring a Cisco 2921 interfaces via SNMP and interface discovery is enabled, with interval of 10mins for rediscovery.

A particular interface I'm interested in is also monitored by MRTG and Cacti.

Whenever the interface gets a lot of traffic flowing thru it consistently (about 50% of its bandwidth; we have 50% rate limiting set on
this type of traffic), the Zabbix samples of the interface become less frequent (set to 5 mins), and there are therefore gaps in
the graphs.

I noticed this in periods were we have a large data transfer running for several hours, causing 50% utilization of the link.
During those several hours there would only be a small number of samples saved of ifInOctets, nowhere near the regular
5 minute interval set up.

The MRTG and Cacti graphs show no problems in this period, only Zabbix does. I was hoping to retire MRTG and Cacti
until I saw that Zabbix wasn't behaving in this area.

I've done extensive tracing using tcpdump and found that there is no issue with the router responding
to the SNMP bulk request for interface data; the OID for the interface is included in the results every time; its just
that Zabbix is silently not logging it for some reason.

An snmpget of the exact same data that Zabbix is fetching in the bulk request (81 items) is reliably returning
results for the interface of interest, in this busy period.

I first thought this may be related to UDP fragmentation issues and firewalling as the bulk request is large
and the response is split over 2 packets, but wireshark reassembles them and shows all to be correct.

I've enabled DebugLevel=4 logging on the Zabbix server where the monitoring is done from, and
nothing obvious stands out in the logs, although I'm suspecting there may be some interference
with the rediscovery process and the regular polling.

Below are are some samples from Latest Data taken during one of these periods (I had the sample interval set to 10 minutes back then).
(these are ifInOctets with a 'changes per second' preprocessing filter set)

2020-05-25 03:30:40 175627
2020-05-25 03:20:40 168993
2020-05-25 03:10:40 160383
2020-05-25 03:00:40 254063 <-- note that samples have returned to every 10 minutes again
2020-05-25 02:30:40 27974 <-- traffic returned to low level somewhere before here
2020-05-25 02:00:40 6819605
2020-05-25 00:00:40 6399868
2020-05-24 22:30:40 6326711
2020-05-24 21:00:40 6317942
2020-05-24 19:40:40 6323955
2020-05-24 18:10:40 6319364
2020-05-24 16:50:40 6317775
2020-05-24 15:10:40 6396247
2020-05-24 13:40:40 6321557
2020-05-24 12:20:40 6323118
2020-05-24 10:50:40 6250206
2020-05-24 09:30:40 6322152 <-- here is where things start getting lost
2020-05-24 08:00:40 432232 <-- traffic ramps up here
2020-05-24 07:50:40 29804
2020-05-24 07:40:40 27692
2020-05-24 07:30:40 26395
2020-05-24 07:20:40 105373
2020-05-24 07:10:40 106862
2020-05-24 07:00:40 226586

As mentioned I'm using SNMPv1 for this, but on reading this discussion the issue seems to not be dependent
on which SNMP version is used.

Ad Widget

Gaps on the graphs, SNMPv3

Comment

Comment

Comment

Comment