Hi,
I have a Zabbix server with 18 hosts, and 5000 items.
I run Zabbix Server and Agent on all my hosts with version 4.0.3 on Debian 9.
My Zabbix Server config:
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=100
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBName=zabbix
DBUser=zabbix
StartPollers=20
StartPollersUnreachable=3
StartPingers=3
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
CacheSize=32M
Timeout=20
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000
AllowRoot=1
Since 2 days around 02:00 AM I have 3 hosts which are flapping with "Zabbix Agent is unreachable for 5 minutes".
I didn't change anything that day before, and on the same location I have 10 more hosts, which are working perfectly fine.
I noticed that my Zabbix server log is filled with:
"failed: first network error, wait for 15 seconds" and seconds later, it says "connection restored".
The items that timeout are completely random.
I have no ping loss for over 4000 pings, both to and from the Zabbix Server.
I restarted Zabbix server, Zabbix agent, I rebooted several times and removed and added the hosts to Zabbix Server again.
There is no resource shortage on Zabbix Server, no high load or memory usage.
All my pollers seem okay, no shortage of any.
Any and all help is appreciated, I've been banging my head at this for 2 days now, with no result.
I have a Zabbix server with 18 hosts, and 5000 items.
I run Zabbix Server and Agent on all my hosts with version 4.0.3 on Debian 9.
My Zabbix Server config:
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=100
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBName=zabbix
DBUser=zabbix
StartPollers=20
StartPollersUnreachable=3
StartPingers=3
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
CacheSize=32M
Timeout=20
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000
AllowRoot=1
Since 2 days around 02:00 AM I have 3 hosts which are flapping with "Zabbix Agent is unreachable for 5 minutes".
I didn't change anything that day before, and on the same location I have 10 more hosts, which are working perfectly fine.
I noticed that my Zabbix server log is filled with:
"failed: first network error, wait for 15 seconds" and seconds later, it says "connection restored".
The items that timeout are completely random.
I have no ping loss for over 4000 pings, both to and from the Zabbix Server.
I restarted Zabbix server, Zabbix agent, I rebooted several times and removed and added the hosts to Zabbix Server again.
There is no resource shortage on Zabbix Server, no high load or memory usage.
All my pollers seem okay, no shortage of any.
Any and all help is appreciated, I've been banging my head at this for 2 days now, with no result.
Comment