Ad Widget

**iconicnetworks** · 17-02-2022, 15:30

Hi all. Just to update further. We have upgraded the whole estate to 6.0 LTS. Problem persists.

**pgatty** · 19-02-2022, 21:42

Have you checked the Zabbix template git repo to see if there are updated versions of the templates you're using?

**tikkami** · 19-02-2022, 22:55

I have exactly same problem.
I haven't identified root cause yet. Database performance could be one reason.

If you look numerical values from history, are there any gaps in data before or after peak values?

Is there anything weird in zabbix server log?

**LenR** · 20-02-2022, 04:45

I've seen this, but I don't remember the exact cause, but the device returned out of bound values after patching, fail-over or something. There are post processing range validation rules that might fix this.

**iconicnetworks** · 20-02-2022, 12:23

Initially i thought this might be where we had modified the stock Juniper templates to improve polling frequency, however i have since removed these templates and restored the originals - same problem. We also know someone else now having the same issue, on a different database setup (AWS) with same problems.

Nothing in the zabbix server logs at all of any interest.

Re numerical values, nope no gaps. Raw data from the device dumped to a log file shows the correct integers being generated by the device as well.

**tikkami** · 21-02-2022, 11:55

Some time ago, I was running snmpwalk to read same values from switch. All were ok there.
if this is not about database performance, could change per second preprosessing mess something?

**iconicnetworks** · 21-02-2022, 11:58

tikkami i think this might be what it is, or at least something along these lines. Traffic is bursty (if that's a word) on this port. It will go from 200Mb to 10Gb in a few seconds. I wonder if Zabbix is mis-calculating that burst and forecasting or projecting the data? I now know of at least 3 other people having the same issues, all with the same symptoms. Interestingly though, at this moment this 'seems' limited to Juniper devices, it's not impacting Cisco devices in the same way as of yet. Any thoughts on how to solve this?

**tikkami** · 21-02-2022, 12:14

I have seen this with Cisco Catalyst- and IE-series switches.

**tikkami** · 21-02-2022, 15:07

Collected some data today from Cisco switch.

Item OID: IF-MIB::ifOutUcastPkts.
This item has only change per second preprocessing.

Maybe next step is to add items to collect raw data to Zabbix...

**iconicnetworks** · 22-02-2022, 00:19

I've been doing a bit of digging into patterns around this. Currently looking as to why the resolution message for this issue is sent 15 mins later than the event actually 'clearing' in Zabbix. I think we know why this is, but we've had another sequence of events this evening causing more of the same high bandwidth alerts. This time though traffic on this Juniper switch was stable throughout and wasn't ramping up.

**tikkami** · 22-02-2022, 12:43

Added new item to collect raw counter value from switch.

Zabbix server/database is definitely messing up with collected data somehow.
Counter value should increase (snmpwalk shows steady growth).

Here is a graph from ifOutUcatsPkts -counter.

**tikkami** · 22-02-2022, 17:14

Collected values seems to be better when "Use bulk requests" is NOT enabled.

**iconicnetworks** · 23-02-2022, 00:10

We've now got similar results to you. We had previously tried different test conditions with bulk requests accepting the CPU difference and had similar results. Ultimately there is something definitely not right with data processing in Zabbix here. A bit more of a search shows a few more people having similar issues now too.

**tim.mooney** · 24-02-2022, 09:30

I've been following this thread with interest -- not because it's an issue that's impacting my site, but because you both have been doing a good job of debugging, which is a nice change of pace for questions on these forums. :-)

Depending on how high your new values per second (NVPS) is for your Zabbix server, what you might want to consider doing is increasing the debug level for your SNMP collectors, so that they log a lot more of what they're doing. It will eat a lot of log space on your Zabbix server (and cause some additional disk I/O, but hopefully you have that to spare), but it might be valuable in shedding light on the problem. If you can use graphs to find an incorrect peak and use SQL to verify that an incorrect value got inserted into your database, with sufficient debug logs you could trace it back and hopefully see what Zabbix read and what got inserted. Look at the '-R' (runtime control) option for zabbix_server, to dynamically increase debugging for just certain subprocesses, if this is something you are interested in pursuing.

Good luck and please update this thread as you make progress on the issue.

Ad Widget

Zabbix 5.4.8 False High Bandwidth Monitoring

Zabbix 5.4.8 False High Bandwidth Monitoring

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment