Ad Widget

**frcre** · 14-04-2020, 13:38

I think I found something, so posting separately: I increased StartPollers, StartPingers, StartPollersUnreachable and StartDiscoverers to 8, all 5 hosts on my new proxy were monitored properly now. Applied the same changes to my old proxy and moved the hosts back: all 5 hosts are monitored properly on the old proxy as well ...

I'm guessing there is some relation between the number of CPU cores and the number of e.g. poller processes then? Does it have to be dividable?

Performance was never an issue, the only parameter I tweaked previously was StartPingers to keep it under the 80% threshold.

**tim.mooney** · 14-04-2020, 20:48

That's a bizarre problem and I'm really glad you stuck with it and followed-up with the "fix".

I'm not aware of any Start* to cores requirement, though that doesn't mean it's not possible. Typically if you have too many or too few, you can tell based on general performance. If there's any ratio that you need to keep between the various subsystems, it should be documented in the comments in the zabbix_serverd.conf.

Since your original problem appeared to be intermittent encryption issues, by any chance were you monitoring the entropy pool on the proxy itself? Running out of entropy came up in another thread on these forums lately, and especially in a VM environment doing a lot of encryption, it's one thing I would be suspicious enough to check. If you still have the original proxy and care to test it, you could set up a monitor on your cloud server of the proxy's entropy (see something like https://major.io/2007/07/01/check-av...ropy-in-linux/ for an idea of the proc file to monitor) and then set the Start* settings back to their orig values, and wait to see if it fails.

**frcre** · 15-04-2020, 10:11

The entropy is an interesting suggestion. I went on to check it, but it doesn't look like it's related:

Situation this morning (7 hosts monitored fine):

Code:

:~$ cat /proc/sys/kernel/random/entropy_avail
3583

Previous zabbix_proxy.conf with default Start* settings (output taken few minutes after reboot, 5 out of 7 hosts in red SNMP status):

Code:

:~$ cat /proc/sys/kernel/random/entropy_avail
948

Working settings restored (proxy service restart, no reboot, once all SNMP checks were green again):

Code:

:~$ cat /proc/sys/kernel/random/entropy_avail
865

Entropy does ramp up faster with more processes active, but SNMP was all good again when entropy was even lower than with the default settings.

**tim.mooney** · 15-04-2020, 11:19

I should have mentioned this in my previous post: to accurately monitor it, it would be better to do it via a Zabbix item. The reason is that by accessing the box interactively and running commands, you're actually contributing to the entropy pool. So in true Heisenberg fashion, by (manually) looking at entropy_avail, you're changing the results of the reading. Even the Zabbix item may contribute to the entropy pool, but it will likely be a much smaller contribution than you were making.

Using a Zabbix item and not logging in to the system will give you a more accurate reading of how it's going to respond under normal use, and you'll also be able to watch it over time to see if it ever looks like it's getting depleted.

Still, I think you're probably right that it's not the problem here.

**frcre** · 15-04-2020, 13:30

I added an item like you said, and it's showing roughly the same values.
Anyhow, I have been constantly connected to my proxy for the past few days for troubleshooting, so my input generated probably quite some entropy then as well

**frcre** · 15-04-2020, 15:25

Well, I'm running into the same issue again, although now no hosts are disabled for SNMP. I'm just constantly getting the "first network error" + "connection restored" spam in zabbix_proxy.log

I'm investigating whether this has to do with our underlying hypervisor setup by moving the VM to a separate physical machine, but any other ideas are still more than welcome!

**frcre** · 16-04-2020, 10:30

After 1 more day of testing, I'm quite sure it's not the VM itself, but some configuration issue.

I removed all 7 hosts and started building them from scratch again:

- adding the 4 switches with their templates: all OK
- adding the router with its templates: all OK
- adding both ASA firewalls with their templates: all OK

I then restarted the zabbix-proxy service on my VM, and the SNMP timeouts and subsequent errors reappeared all over again ...

I'm completely lost at the moment, went back to just the 4 switches with 16 basic SNMP items on each, with SNMP credentials hardcoded on each of them instead of using macros: still the same issue.

It seems that I have to completely remove all hosts, restart the proxy, readd them manually and not restart the proxy service or VM to keep running ...

**frcre** · 16-04-2020, 13:02

OK, I guess I have solved my issue, so posting one last (hopefully) update about this, should others ever run into the same behaviour ...

The root cause seems to be very stupid: I had a few SNMP items in a default template that I copied, and in my mass update something went wrong with adjusting the encryption method to AES. Some of them had DES applied, so naturally querying these items failed big time.

I'm suspecting that having multiple 100's of SNMPv3 items on a host, with some of them having incorrect encryption settings, messes up some internal caching mechanism handling the encryption of PDUs. This seems to have led to incorrect encryption of even items with the correct AES setting applied. I replicated the issue with ALL items incorrectly configured, and in that case Zabbix is spamming the log files with errors about this. When the majority of SNMP items seems to be working fine, these errors are not present.

Long story short: pebkac, and the "SNMP settings on host level" feature from 5.0 is long overdue

**tim.mooney** · 16-04-2020, 19:45

Thanks for following up with the solution. That may indeed be a help to others in the future!

Ad Widget

SNMPv3 failure after Zabbix proxy hardware reconfiguration

SNMPv3 failure after Zabbix proxy hardware reconfiguration

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment