If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to REGISTER before you can post. To start viewing messages, select the forum that you want to visit from the selection below.
Ad Widget
Collapse
Can db connection errors increase unreachable poller process processing time?
Unreachable pollers are responsible of host availability. So if there is some kind of network issue its normal that utilization will increase dependening of the number of hosts affected and the number of pollers.
How it works:
If a host becomes unreachable, the poller checks every X seconds (default 15) if there is a response from the host.
That process is repeated for X seconds (default 45) until the host becomes unavailable.
When the host is unavailable, the host is checked for aviability every X seconds (default 60).
This values can be modified in the zabbix server configuration file, but i think the default state is good enaugh.
### Option: UnreachablePeriod
# After how many seconds of unreachability treat a host as unavailable.
# UnreachablePeriod=45
### Option: UnavailableDelay
# How often host is checked for availability during the unavailability period, in seconds.
# UnavailableDelay=60
### Option: UnreachableDelay
# How often host is checked for availability during the unreachability period, in seconds.
# UnreachableDelay=15
First of all, thank you for your answer
But during the time when there was a network issue, there were only 6 network error logs in zabbix.
(the network monitoring most hosts and the db connection network are separated.)
Does the value of unreachable poller process seem reasonable when looking at the error logs of the 6 cases?
zabbix server log
---
8571:20220418:084345.948 SNMP agent item "CPU_Util" on host "#" failed: first network error, wait for 15 seconds
8555:20220418:084424.457 SNMP agent item "CPU.Utilization" on host "#" failed: first network error, wait for 15 seconds
8597:20220418:084824.183 SNMP agent item "Memory.Utilization" on host "#" failed: first network error, wait for 15 seconds
8603:20220418:085157.421 SNMP agent item "CPU_Util" on host "#" failed: another network error, wait for 15 seconds
8498:20220418:085345.129 SNMP agent item "CPU_Util" on host "#" failed: first network error, wait for 15 seconds
8601:20220418:085415.403 SNMP agent item "CPU_Util" on host "#" failed: another network error, wait for 15 seconds
---
If ordinary poller fails to poll a value for item, then UnreachableDelay kicks in and its tried again in that time until UnreachablePersiod expires... so having 15 and 45 there by default means it is checked 3 times... if it's still not reachable, then host is declared Unavailable and polling is disabled. Its polling is given over to unreachable pollers, which try it in UnavailableDelay persiod. When they finally manage to reach it, its given back to normal pollers.
If you have overall network issue, then unreachable poller usage can go and will go up.
How many of those pollers you have? Whats the timeout value? if you have long timeout, like 20s (default is 3 I think), then your unreachable poller will wait for that timeout, if you have not too many unreachable pollers, they will all be occupied, usage goes up...
StartPollersUnreachable=10 and Timeout value is set 4 (default is 3) Does the value of unreachable poller process indicate the time taken while the host is declared to be unreachable?
Comment