Solving the alert: Zabbix unreachable poller processes more than 75% busy

HYPERMAN

Junior Member

Joined: Sep 2018

Posts: 4
#1

Solving the alert: Zabbix unreachable poller processes more than 75% busy

11-05-2020, 14:30

Hello,

we were for a long time plagued by the alert: Zabbix unreachable poller processes more than 75% busy

There is a lot of info about this message on the net, but none really helped me. My main problem was finding out what exactly the unreachable pollers were doing. So I thought I'd share what I've discovered, even it might not be 100% correct. I am still a zabbix newby, so feel free to correct where necessary or provide better methodology

STEP 1: Cleaning up unreachable items
Go to Configuration > Hosts, click on any random 'items' link.

Open the filter, and clean all fields to emtpy/all/.... IMPORTANT: This includes the 'Host' field you just filled

Change State from all to Not supported. This will cause Status to change to Enabled.

Searching produces a report of all items that are unpollable. Unfortunately, it also includes items from disabled hosts. I disabled any item that had no chance of becoming available.

STEP 2: Cleaning up unreachable hosts.
Go again to Configuration > Hosts

Look at the column 'Availablity' with Red/green leds for ZBX|SNMP|JMX|IPMI

Everything red takes up capacity from an unreachable poller.

Again I disabled any host that would never come up again

STEP 3: Finding out what the unreachable pollers are doing.

This is what led me to discover step 2.
Open a linux terminal and do something like ps axu|grep -i unreachable

Note the unreachable pollers that are slow. E.g. I had some saying 1 item in 60 seconds. Note the PID (of the thread, not of the whole zabbix process)

Use strace to find out what that thread is doing, e.g. strace -p 1234

I got some IO on an IP adress (bingo) and a select on fd 0 with time out of 30 seconds.

For the fd number, do something like ls -hal /proc/1234/fd/0 , this is for PID 1234 and FD 0. You can now see what file/socket/... is causing the slowdown.

This also yielded an interesting fact:

In /etc/zabbix/zabbix_server.conf there was a line Timeout=30 . It turns out some of our items do in rare circumstances need 30 seconds to check so this is impossible to change. But it also meant every unreachable SNMP host took 30 seconds to check, and there were a lot of these. It would be nice to be able to tune this setting specifically for the unreachable pollers.
Tags: None
mauriciomsr20

Junior Member

Joined: Sep 2018

Posts: 2
#2

29-04-2021, 19:37

This is explained in this video of Dmitry Lambert (zabbix team) near the end of video:

https://www.youtube.com/watch?v=XaTNmoGzZXM
Comment

Ad Widget

Solving the alert: Zabbix unreachable poller processes more than 75% busy

Solving the alert: Zabbix unreachable poller processes more than 75% busy

Comment