Hello,
I wanted to share something that has me scratching my head for quite some time. I've been seeing issues regarding the connections to the proxy servers from the agents having issues. We have 8700+ agents connecting with 200k+ items and required VPS is over 1850+.
We are hosting on an RHEL 7 server and have 1 frontend server, 1 zabbix server, 3 zabbix proxies and mysql databases hosted on separate servers.
The specific error we receive is as follows:
[369052]: 3244:20210316:224654.813 active check data upload to [zabbix-proxy-cluster.abc.xyz:10051] started to fail ([connect] cannot connect to [[zabbix-proxy-cluster.abc.xyz]:10051]: A connection timeout occurred.)
Upon checking on the Proxy server itself it seems the issue is with the TCP connections. We've already increased the limits on the OS(net.core.somaxconn) but it looks like zabbix isn't picking them up.
# ss -ntl '( sport = :10051 )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 129 128 10.x.x.x:10051 *:*
This is causing a lot of issues as new agents aren't able to connect and we get a load of agent heartbeat issues. All our agents are active.
Please suggest if someone has encountered or fixed this before. Any help would be much appreciated!!
Thanks,
Karan
I wanted to share something that has me scratching my head for quite some time. I've been seeing issues regarding the connections to the proxy servers from the agents having issues. We have 8700+ agents connecting with 200k+ items and required VPS is over 1850+.
We are hosting on an RHEL 7 server and have 1 frontend server, 1 zabbix server, 3 zabbix proxies and mysql databases hosted on separate servers.
The specific error we receive is as follows:
[369052]: 3244:20210316:224654.813 active check data upload to [zabbix-proxy-cluster.abc.xyz:10051] started to fail ([connect] cannot connect to [[zabbix-proxy-cluster.abc.xyz]:10051]: A connection timeout occurred.)
Upon checking on the Proxy server itself it seems the issue is with the TCP connections. We've already increased the limits on the OS(net.core.somaxconn) but it looks like zabbix isn't picking them up.
# ss -ntl '( sport = :10051 )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 129 128 10.x.x.x:10051 *:*
This is causing a lot of issues as new agents aren't able to connect and we get a load of agent heartbeat issues. All our agents are active.
Please suggest if someone has encountered or fixed this before. Any help would be much appreciated!!
Thanks,
Karan
Comment