Hi,
We're experiencing a lot of agent connection flapping. In agentd.log:
[...]
3920:20181017:075423.976 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:075523.898 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:081944.260 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:082044.182 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:082506.026 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:082606.963 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:090428.187 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:090528.109 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:090949.938 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:091049.860 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:093510.831 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:093610.753 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:094431.426 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:094531.348 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:094952.177 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:095052.083 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:100513.460 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:100613.382 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:101434.070 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:101535.008 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
[...]
This goes on and on, flooding the agent log. As far as I've seen it happens only on Windows (2016) servers: I have not seen the agent on Linux (CentOS 6/7) servers that are on the same IP subnet (so, no firewall in between) show this in their log.
The servers are VM's on VMware. When I place a Windows and Linux VM on the same (any) hypervisor, the agent connection on the Windows server will start flapping, but not on the Linux server.
Our server/proxy is 3.2.11 (we were waiting for 4.x to happen..), the Windows agents are 3.2.7, and we run dual stack IPv4/IPv6. Hostnames resolve to both IPv4 and IPv6 addresses. When I setup a ping, the ping returns a stable pong.
We've been searching for the cause for some time now, but can't find the issue. Can I get some pointers what to look for?
We're experiencing a lot of agent connection flapping. In agentd.log:
[...]
3920:20181017:075423.976 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:075523.898 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:081944.260 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:082044.182 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:082506.026 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:082606.963 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:090428.187 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:090528.109 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:090949.938 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:091049.860 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:093510.831 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:093610.753 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:094431.426 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:094531.348 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:094952.177 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:095052.083 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:100513.460 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:100613.382 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
3920:20181017:101434.070 active check configuration update from [zbx_proxy.domain.tld:10051] started to fail (cannot connect to [[zbx_proxy.domain.tld]:10051]: A connection timeout occurred.)
3920:20181017:101535.008 active check configuration update from [zbx_proxy.domain.tld:10051] is working again
[...]
This goes on and on, flooding the agent log. As far as I've seen it happens only on Windows (2016) servers: I have not seen the agent on Linux (CentOS 6/7) servers that are on the same IP subnet (so, no firewall in between) show this in their log.
The servers are VM's on VMware. When I place a Windows and Linux VM on the same (any) hypervisor, the agent connection on the Windows server will start flapping, but not on the Linux server.
Our server/proxy is 3.2.11 (we were waiting for 4.x to happen..), the Windows agents are 3.2.7, and we run dual stack IPv4/IPv6. Hostnames resolve to both IPv4 and IPv6 addresses. When I setup a ping, the ping returns a stable pong.
We've been searching for the cause for some time now, but can't find the issue. Can I get some pointers what to look for?

Comment