Ad Widget

**BDiE8VNy** · 28-08-2015, 17:56

"Starting from 2.2.8 Zabbix server and proxy daemons will always retry at least one time: either through the SNMP library's retrying mechanism or through the internal bulk processing mechanism."
See: Item type SNMP agent

Before applying any patch or doing further investigation I strongly suggest to first try it again with having EnableSNMPBulkRequests disabled.
See: configuration parameter and ZBXNEXT-2301

**ingus.vilnis** · 01-09-2015, 16:01

Hi,

The problem might be on the device side and the manufacturer's implementation of net-snmp library.

Please check your devices whether they allow you to set unique EngineID per each device.

Might be a long reading but this could also be related to your problem https://support.zabbix.com/browse/ZBX-8385.

When your SNMPv3 devices fail to communicate with Zabbix, what happens if you restart Zabbix server? Are the devices monitored fine after server restart and, if so, after how long time do they fail again?

Best Regards,
Ingus

**nucleusv** · 21-09-2015, 15:49

Originally posted by ingus.vilnis

Hi,

The problem might be on the device side and the manufacturer's implementation of net-snmp library.

I have check all EngineIDs over my devices they are unique.

I have Juniper, ARISTA, ENGECORE devices.

If only one Juniper device is activated, zabbix server begins to query data from network device after each zabbix server restarts, but stops to query after about 70-80 seconds.

Is any requirements to length of EngineID?

And what must be configured on network device?
Only EngineID?

**Alessan** · 27-01-2016, 23:47

We have the same problem,

Then we decide capture traffic with tcpdump on zabbix ethernet interface and decrypt with wireshark using snmp user tables.

The results are random bad formatted request that can't be decrypted.

Ones because unexpected size (not multiple of 8), other because can't be decrypted with the key.

Same result with new installed zabbix server on virtualbox test machine with only one snmpv3 device no bulk (~500 items).

Zabbix version: 2.4.7

Images attached.

Attached Files

**nucleusv** · 28-01-2016, 13:07

Originally posted by Alessan

We have the same problem,

Then we decide capture traffic with tcpdump on zabbix ethernet interface and decrypt with wireshark using snmp user tables.

Please check all snmp v3 items on host, and be sure that EVERY item has the same Authentication protocol and Privacy protocol.

I had a similar problem, and hosts become unavailable, but than I had discovered that in one item prototype auth protocol was different from other prototypes, and device was not set to server this protocol, so zabbix hadn't got result from device.

**Alessan** · 28-01-2016, 14:21

All items are inherited from the same template with an item prototype. Items can't have different authentication options.

Inbound errors on interface $1
SNMPv3 agent
ifInErrors[{#SNMPVALUE}]
IF-MIB::ifInErrors.{#SNMPINDEX}

Authentication on interface 1 cant be different that authentication on interface 2.

Each run fails different items, each time we go to lastest data there are random interfaces with no value in last run.

**dampersand** · 03-06-2016, 01:23

I would kill for an update to this.

I am having a very similar issue after my network guys updated firmware on four switches (Y U DO DIS IN PROD), and I too have tried all of these things, to no avail.

The only difference I have in my problem is that our engineIDs are NOT all unique. I don't know a lot about engineIDs, so I'm curious why this would matter? It didn't matter before we had them set at all, and net-snmp doesn't ever seem to return them, that I see.

One very interesting symptom is that all switches will poll correctly except for one... and if I restart snmpd on that switch, it will start working - but a DIFFERENT switch will go down.

**brynza** · 08-06-2016, 21:00

I have the same problem with one of cisco routers (with the second one I also have a problems but it stops to be monitored at all). All the engineID's are unique.
With default timeout I get data irregularly. Meanwhile snmpget proceeds without pauses or timeouts. I've found temporal solution increasing the timeout and now I get the data with frequency from 1 to 3 minutes (checks frequency is 1 minute).
But this just an ugly workaround.

**troffasky** · 27-07-2016, 11:59

Is this still an issue in 2.4 and 3.0? I've experienced poor reliability of SNMPv3 in 2.2 with holes in graphs and eventually no data at all. Restarting the zabbix-server service always brings it back.

**colohost** · 11-05-2018, 00:29

I'm seeing this issue in 3.4 whenever there is any significant quantity of SNMPv3 OID's to poll from a given device; i.e. if I'm watching multiple OID's per port (status, bit rate, error rate, etc.) on a 48-port switch, and using SNMPv3, we're talking 48 ports * 6 OID's = ~288 OID's to collect. The polling interval doesn't seem to matter; if it's five minutes, the errors and gaps will occur at five minute intervals, if it's an hour, you'll random miss items on an hourly basis. I will say I'm using authPriv; I haven't tested with lower since, if you're discarding all the security benefits, what's the point.

Just for comparison purposes, I wrote a quick bash script to snmpwalk ifDescr on a particular Cisco switch stack of five devices, 48 ports each, then snmpbulk requested the OID's for admin status and oper status of each port; took about 45 seconds. Same thing with snmpv2 took 6 seconds.

I assume this significantly longer query time is some combination of each side having to do the hashing and encryption/decryption; perhaps the low powered cpu's in the switches can't do the sha/aes very fast. I've had to switch all of my network devices with 10+ OID's back to snmpv2 or zabbix will simply always miss some data.

Long term, only thing I can think of as a way to get around this is for zabbix to analyze a given host's items, and if the quantity of snmpv3 items exceeds a certain user defined number (which you could tune to your hardware), break them up into sequentially polled batches of that defined quantity, and adjust when they poll to keep the polling interval the same for each batch.

**kloczek** · 11-05-2018, 18:05

Originally posted by colohost

I'm seeing this issue in 3.4 whenever there is any significant quantity of SNMPv3 OID's to poll from a given device; i.e. if I'm watching multiple OID's per port (status, bit rate, error rate, etc.) on a 48-port switch, and using SNMPv3, we're talking 48 ports * 6 OID's = ~288 OID's to collect. The polling interval doesn't seem to matter; if it's five minutes, the errors and gaps will occur at five minute intervals, if it's an hour, you'll random miss items on an hourly basis. I will say I'm using authPriv; I haven't tested with lower since, if you're discarding all the security benefits, what's the point.

Chec you proxy logs looking for your device timeout messages
Check do you have in you host interface "Use bulk requests"
How many LLDs do you have defined for this monitored device?

Generally with many LLD problem is that snmpd from net-snmp which is used even on proprietary devices is not reentrant which is causing that when proxy will be querying multiple OIDs at the same time at least one of those querier may fail with timeout.
If this is the case always it should be report as service request against exact device with snmp aganer. Maybe at least they one of the companies will invest some paid developers time to rewrite the critical net-snmp code.

**colohost** · 11-05-2018, 18:32

There isn't a timeout logged, the devices are just still sending data back when Zabbix hits its own configured timeout value and stops looking for additional data; the result is you have gaps in your item data. This isn't an LLD-specific issue, but it occurs there as well if the underlying discovery doesn't complete, so then you have items that bounce between no longer discovered and discovered. If there are too many SNMPv3 OID's to poll, the response simply won't finish until Zabbix has already stopped waiting for the rest of the data. Of interesting note, Arista devices seem to send snmpv3 responses at about 3x the rate of Cisco devices, but even those have the same problem if there are enough OID's that the responses can't finish in at most 30 seconds (max zabbix timeout).

I had one switch with 2000+ OID's (stack of several hundred ports) miss more data than it got; switched to SNMPv2 and problem was gone.

**kloczek** · 12-05-2018, 22:01

SNMP OIDs per ports polled over OIDs generated over LLD if SNMP bulk query is enabled are red over batched SNMP queries so number of data is not relevant. More important is total number of SNMP queries/s.
You can measure this factor by use my SNMPv2-MIB template https://github.com/kloczek/zabbix-te...MIB/SNMPv2-MIB

**steveroebuck** · 16-05-2018, 14:55

We are experiencing exactly the same issues with SNMPv3 using authpriv, some interfaces will come back fine others will have massive gaps in time series data for switch throughput.

Templates are fine there is no misconfiguration there, we have the timeout set to maximum of 30 and it's just not enough with some devices to pull back all the metrics we need to monitor...does anyone have a workaround or solution for this as its a deal breaker in terms of our zabbix deployment, we cannot use SNMPv2 as it violates our corporate security policy.

We experience the timeout issues on the following hardware

Trend Tipping Point NGFW
Checkpoint 4600 Appliance
HPE Flexfabric 5900AF, 5900CP and 5940
HPE 6125XLG and 6127XLG
F5 Big IP Appliance

Devices are not becoming "unsupported" and we can see no timeouts or strange errors in either the Zabbix or proxy logs.

Ad Widget

Gaps on the graphs, SNMPv3

Gaps on the graphs, SNMPv3

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment