So I recently upgraded zabbix on all of my Ubuntu servers from 4.0.14-1+bionic to 4.0.15-1+bionic and after a couple of minutes the zabbix reported that the agent on the zabbix server was unreachable.
tl;dr: Zabbix agent on zabbix server stops responding after 1 minute when LogLevel=3. With LogLevel=4 everything works as normal.
The first thing I noticed was that when I tried to restart the agent the systemctl restart zabbix-agent.service seemed to hang for about a minute before it managed to restart the agent. Afterwards zabbix reported that the agent was reachable, but after a few minutes the same thing happened again and the agent was unreachable.
I had a look in the logs and there was not much more than the usual agent #1 started [collector] being logge d and I therefor decided to increase the LogLevel from 3 (default) to 4 in /etc/zabbix/zabbix_agentd.conf
After restarting the agent I noticed that it was no longer reported as unreachable after a few minutes, but everything seemed to work just fine. I then removed the LogLevel=4 from the config file and restarted the agent again and much to my surprise the agent again was reported unreachable after a few minutes.
I started a loop that executed zabbix_get -s 127.0.0.1 -k agent.ping every 10 seconds and restarted the agent again and after 1 minute the zabbix_get started failing with a timeout error. I restarted the agent several times to make sure that the agent always stopped responding after 1 minute which it did. I then added the LogLevel=4 to the config file again, restarted the agent and now the agent kept responding as normal for several hours.
I then tried to set LogLevel=3 in the config file and restart the agent, but again after 1 minute the agent stopped responding and there is nothing in the logs that gives me any indication of what the problem might be.
It is very weird that the LogLevel should have any impact on the zabbix agent becoming unreachable or not after a minute. All the other servers running the same Ubuntu and zabbix agent version works just fine. This problem only occur on the zabbix server.
Have anyone else encountered anything like this or do you have any suggestions on how to troubleshoot this? I've checked the logs for both the server and the agent and I cannot find anything in particular, but I kind of suspecting that it has something to do with active checks that runs every minute and that for some reason fails when using the default LogLevel.
I have also checked the bug tracker for zabbix, but I haven't seen anyone with a related issue so I figured I asked here. In the meanwhile I will continue my research and try to disable various checks that runs every minute.
tl;dr: Zabbix agent on zabbix server stops responding after 1 minute when LogLevel=3. With LogLevel=4 everything works as normal.
The first thing I noticed was that when I tried to restart the agent the systemctl restart zabbix-agent.service seemed to hang for about a minute before it managed to restart the agent. Afterwards zabbix reported that the agent was reachable, but after a few minutes the same thing happened again and the agent was unreachable.
I had a look in the logs and there was not much more than the usual agent #1 started [collector] being logge d and I therefor decided to increase the LogLevel from 3 (default) to 4 in /etc/zabbix/zabbix_agentd.conf
After restarting the agent I noticed that it was no longer reported as unreachable after a few minutes, but everything seemed to work just fine. I then removed the LogLevel=4 from the config file and restarted the agent again and much to my surprise the agent again was reported unreachable after a few minutes.
I started a loop that executed zabbix_get -s 127.0.0.1 -k agent.ping every 10 seconds and restarted the agent again and after 1 minute the zabbix_get started failing with a timeout error. I restarted the agent several times to make sure that the agent always stopped responding after 1 minute which it did. I then added the LogLevel=4 to the config file again, restarted the agent and now the agent kept responding as normal for several hours.
I then tried to set LogLevel=3 in the config file and restart the agent, but again after 1 minute the agent stopped responding and there is nothing in the logs that gives me any indication of what the problem might be.
It is very weird that the LogLevel should have any impact on the zabbix agent becoming unreachable or not after a minute. All the other servers running the same Ubuntu and zabbix agent version works just fine. This problem only occur on the zabbix server.
Have anyone else encountered anything like this or do you have any suggestions on how to troubleshoot this? I've checked the logs for both the server and the agent and I cannot find anything in particular, but I kind of suspecting that it has something to do with active checks that runs every minute and that for some reason fails when using the default LogLevel.
I have also checked the bug tracker for zabbix, but I haven't seen anyone with a related issue so I figured I asked here. In the meanwhile I will continue my research and try to disable various checks that runs every minute.
Comment