Hello,
I've seen much on the topic of performance tuning, but less on the topic of performance tuning distributed zabbix proxies. Any help on this problem that's been plaguing me is greatly appreciated.
I have a proxy (2.2.8) in Kigali, Rwanda monitoring ~200 hosts, 75 items each host. My Zabbix server is with Amazon in EU west (also 2.2.8). Internet connectivity between the two is very good, though the latency is certainly longer than if the two were sitting right next to each other.
Mostly, things are good. However, I do sometimes get gaps in my bandwidth graphs. I had been getting a lot of
for nodes with good connectivity, and good, responsive snmpwalks. However, when I snmpwalk with -r 0 (to turn off retries) I do sometimes get timeouts. My monitoring is for wireless network devices, and there are times when a single SNMP UDP request will be lost. For reference, my zabbix proxy timeout=30, and my pollers, unreachable pollers, etc are no where near busy (see image)

I believe my gaps and errors are tied to the recent changes in 2.2.3 where SNMP retries were eliminated. When I upgraded from 2.2.7 to 2.2.8 (which allows one SNMP retry), the issue got slightly better. (I do wish I could add multiple retries...but I gather I can't do that without recompiling zabbix?)
I also know that the devices I monitor don't support bulk SNMP gets. I discovered that I could disable bulk requests for the whole proxy by setting EnableSNMPBulkRequests=0. This made a huge difference in my proxy queue, greatly reducing queue size. (see graph, BulkRequests was disabled at 5pm..note the change)

However, the change made the queue on my zabbix *server* balloon (see graph below BulkRequests was disabled at 5pm..note the increase in server queue). I'm assuming this is because SNMP data is now being sent individually to the server, instead of in bulk? Any suggestions what I can do to reduce the queue size on my server to get my queue back to normal?

Any suggestions? Thanks so much!
I've seen much on the topic of performance tuning, but less on the topic of performance tuning distributed zabbix proxies. Any help on this problem that's been plaguing me is greatly appreciated.
I have a proxy (2.2.8) in Kigali, Rwanda monitoring ~200 hosts, 75 items each host. My Zabbix server is with Amazon in EU west (also 2.2.8). Internet connectivity between the two is very good, though the latency is certainly longer than if the two were sitting right next to each other.
Mostly, things are good. However, I do sometimes get gaps in my bandwidth graphs. I had been getting a lot of
Code:
2404:20150122:034606.303 SNMP agent item "If.32.ifOutErrors.["2"]" on host "AXK-CONCEPT PLUS" failed: first network error, wait for 15 seconds 2778:20150122:034621.913 resuming SNMP agent checks on host "AXK-CONCEPT PLUS": connection restored

I believe my gaps and errors are tied to the recent changes in 2.2.3 where SNMP retries were eliminated. When I upgraded from 2.2.7 to 2.2.8 (which allows one SNMP retry), the issue got slightly better. (I do wish I could add multiple retries...but I gather I can't do that without recompiling zabbix?)
I also know that the devices I monitor don't support bulk SNMP gets. I discovered that I could disable bulk requests for the whole proxy by setting EnableSNMPBulkRequests=0. This made a huge difference in my proxy queue, greatly reducing queue size. (see graph, BulkRequests was disabled at 5pm..note the change)

However, the change made the queue on my zabbix *server* balloon (see graph below BulkRequests was disabled at 5pm..note the increase in server queue). I'm assuming this is because SNMP data is now being sent individually to the server, instead of in bulk? Any suggestions what I can do to reduce the queue size on my server to get my queue back to normal?

Any suggestions? Thanks so much!
Comment