I run zabbix for numerous clients and am not seeing this problem.
I am running it on my home network, have for some time (on 4.4.5 so a bit old), and in the last month or so have been getting alerts from "Unreachable poller processes more than 75% busy".
So I did what I normally do, just increased them and moved on. Error came back. Was distracted so just doubled them without thinking. Error came back.
So I looked more closely -- I was up to 100 processes. I have between zero and one item unreachable depending on whether my laptop is booted.
So... what in the world are they doing to be busy? Is there some other functionality now rolled into those processes? When I look at the history they range from 2% to 97% busy, hovering around 50% on average, but with excursions where it stays above 90%. But again -- at most 1 system is unreachable. And 66 total hosts defined.
Which actually begs another question: What does "unreachable" mean in this context -- unreachable for SNMP polls? Unreachable for pings?
If I look at ps I see them looking like this (representative sample):
zabbix 11419 11183 0 135138 10484 0 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #43 [got 1 values in 60.058949 sec, getting values]
zabbix 11424 11183 0 135138 10496 1 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #44 [got 1 values in 60.054875 sec, getting values]
zabbix 11425 11183 0 135138 11644 3 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #45 [got 1 values in 60.031330 sec, getting values]
zabbix 11426 11183 0 135138 11764 3 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #46 [got 1 values in 60.059361 sec, getting values]
zabbix 11427 11183 0 135138 10480 3 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #47 [got 1 values in 60.053022 sec, getting values]
zabbix 11428 11183 0 135138 10728 2 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #48 [got 0 values in 0.000049 sec, getting values]
zabbix 11429 11183 0 135138 11040 0 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #49 [got 1 values in 60.058609 sec, getting values]
I do have discovery configured, but it is for ICMP only, in groups of 128 IP's at 10 DAY intervals, so I cannot see it being related to that.
Where do I start looking?
Linwood
I am running it on my home network, have for some time (on 4.4.5 so a bit old), and in the last month or so have been getting alerts from "Unreachable poller processes more than 75% busy".
So I did what I normally do, just increased them and moved on. Error came back. Was distracted so just doubled them without thinking. Error came back.
So I looked more closely -- I was up to 100 processes. I have between zero and one item unreachable depending on whether my laptop is booted.
So... what in the world are they doing to be busy? Is there some other functionality now rolled into those processes? When I look at the history they range from 2% to 97% busy, hovering around 50% on average, but with excursions where it stays above 90%. But again -- at most 1 system is unreachable. And 66 total hosts defined.
Which actually begs another question: What does "unreachable" mean in this context -- unreachable for SNMP polls? Unreachable for pings?
If I look at ps I see them looking like this (representative sample):
zabbix 11419 11183 0 135138 10484 0 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #43 [got 1 values in 60.058949 sec, getting values]
zabbix 11424 11183 0 135138 10496 1 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #44 [got 1 values in 60.054875 sec, getting values]
zabbix 11425 11183 0 135138 11644 3 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #45 [got 1 values in 60.031330 sec, getting values]
zabbix 11426 11183 0 135138 11764 3 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #46 [got 1 values in 60.059361 sec, getting values]
zabbix 11427 11183 0 135138 10480 3 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #47 [got 1 values in 60.053022 sec, getting values]
zabbix 11428 11183 0 135138 10728 2 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #48 [got 0 values in 0.000049 sec, getting values]
zabbix 11429 11183 0 135138 11040 0 Aug31 ? 00:00:01 /usr/local/sbin/zabbix_server: unreachable poller #49 [got 1 values in 60.058609 sec, getting values]
I do have discovery configured, but it is for ICMP only, in groups of 128 IP's at 10 DAY intervals, so I cannot see it being related to that.
Where do I start looking?
Linwood
Comment