So, I'm trying to get Zabbix running on a Windows Server 2019 three-node Hyper-V cluster, backed with CSV storage on a SAN.
I initially did what I did the last time I installed Zabbix server and fired up a gen 2 VM and installed Debian. Installed MySQL, setup the DB, then installed Zabbix-Server using nginx as the web-server.
It worked for a few days until the weekend, when after Cluster Aware Updates, monitoring of the VMs broke and I'm now getting the below in the zabbix_server logs.
I have configured the agents to communicate via PSK, with the same PSK and Identity on all hosts and server. The instructions on using PSK encrypted comms isn't clear if this should work or not, I've tried removing the PSK comms but I still get no availability from the agent.
There are three different errors there and I haven't found a conclusive answer to any of them. I've tested the connection between the server and agents and it's fine. No issues I can trace.
Additionally, from an agent on a Windows VM on the same cluster;
Also, I have found that on both the Debian VM, having rebuilt it about three times, and now on an Ubuntu VM, the front-end can't seem to communicate with the Zabbix Server and I get the below;

Any one got any troubleshooting ideas? I've searched as hard as I can and I've come up with nothing that's got me close to fixing this. The only thing I've got left to try is running Zabbix on a physical host and see if it's something related to virtualised network on the cluster.
I initially did what I did the last time I installed Zabbix server and fired up a gen 2 VM and installed Debian. Installed MySQL, setup the DB, then installed Zabbix-Server using nginx as the web-server.
It worked for a few days until the weekend, when after Cluster Aware Updates, monitoring of the VMs broke and I'm now getting the below in the zabbix_server logs.
I have configured the agents to communicate via PSK, with the same PSK and Identity on all hosts and server. The instructions on using PSK encrypted comms isn't clear if this should work or not, I've tried removing the PSK comms but I still get no availability from the agent.
Code:
1488:20250122:160912.730 failed to accept an incoming connection: from xxx.xxx.xxx.47: unspecified certificate verification error: TLS handshake set result code to 1: file ../ssl/record/rec_layer_s3.c line 316 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error" 1485:20250122:160914.268 failed to accept an incoming connection: from xxx.xxx.xxx.159: reading first byte from connection failed: [11] Resource temporarily unavailable 1489:20250122:160915.683 failed to accept an incoming connection: from xxx.xxx.xxx.70: reading first byte from connection failed: [11] Resource temporarily unavailable 1487:20250122:160916.183 failed to accept an incoming connection: from xxx.xxx.xxx.50: reading first byte from connection failed: [11] Resource temporarily unavailable 1486:20250122:160916.704 failed to accept an incoming connection: from xxx.xxx.xxx.213: reading first byte from connection failed: [11] Resource temporarily unavailable 1488:20250122:160916.735 failed to accept an incoming connection: from xxx.xxx.xxx.137: reading first byte from connection failed: [11] Resource temporarily unavailable 1485:20250122:160918.273 failed to accept an incoming connection: from xxx.xxx.xxx.201: reading first byte from connection failed: [11] Resource temporarily unavailable 1485:20250122:160918.273 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1485:20250122:160918.273 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1489:20250122:160919.687 failed to accept an incoming connection: from xxx.xxx.xxx.159: reading first byte from connection failed: [11] Resource temporarily unavailable 1487:20250122:160920.187 failed to accept an incoming connection: from xxx.xxx.xxx.70: reading first byte from connection failed: [11] Resource temporarily unavailable 1486:20250122:160920.708 failed to accept an incoming connection: from xxx.xxx.xxx.50: reading first byte from connection failed: [11] Resource temporarily unavailable 1486:20250122:160920.709 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1488:20250122:160920.739 failed to accept an incoming connection: from xxx.xxx.xxx.201: reading first byte from connection failed: [11] Resource temporarily unavailable 1485:20250122:160922.277 failed to accept an incoming connection: from xxx.xxx.xxx.213: reading first byte from connection failed: [11] Resource temporarily unavailable 1485:20250122:160922.629 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1489:20250122:160923.692 failed to accept an incoming connection: from xxx.xxx.xxx.50: reading first byte from connection failed: [11] Resource temporarily unavailable 1489:20250122:160923.692 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1489:20250122:160923.692 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1489:20250122:160923.693 failed to accept an incoming connection: from xxx.xxx.xxx.164: unspecified certificate verification error: TLS handshake set result code to 1: file ../ssl/record/rec_layer_s3.c line 316 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error" 1487:20250122:160924.192 failed to accept an incoming connection: from xxx.xxx.xxx.137: reading first byte from connection failed: [11] Resource temporarily unavailable 1486:20250122:160924.713 failed to accept an incoming connection: from xxx.xxx.xxx.201: reading first byte from connection failed: [11] Resource temporarily unavailable 1488:20250122:160924.744 failed to accept an incoming connection: from xxx.xxx.xxx.159: reading first byte from connection failed: [11] Resource temporarily unavailable 1485:20250122:160926.633 failed to accept an incoming connection: from xxx.xxx.xxx.137: reading first byte from connection failed: [11] Resource temporarily unavailable 1489:20250122:160927.697 failed to accept an incoming connection: from xxx.xxx.xxx.212: reading first byte from connection failed: [11] Resource temporarily unavailable 1487:20250122:160928.196 failed to accept an incoming connection: from xxx.xxx.xxx.50: reading first byte from connection failed: [11] Resource temporarily unavailable 1487:20250122:160928.197 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1487:20250122:160928.197 failed to accept an incoming connection: from xxx.xxx.xxx.241: unspecified certificate verification error: TLS handshake set result code to 1: file ../ssl/record/rec_layer_s3.c line 316 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error" 1487:20250122:160928.198 failed to accept an incoming connection: from xxx.xxx.xxx.239: unspecified certificate verification error: TLS handshake set result code to 1: file ../ssl/record/rec_layer_s3.c line 316 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error" 1487:20250122:160928.198 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1487:20250122:160928.199 failed to accept an incoming connection: from xxx.xxx.xxx.163: unspecified certificate verification error: TLS handshake set result code to 1: file ../ssl/record/rec_layer_s3.c line 316 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error" 1487:20250122:160928.200 failed to accept an incoming connection: from xxx.xxx.xxx.184: unspecified certificate verification error: TLS handshake set result code to 1: file ../ssl/record/rec_layer_s3.c line 316 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error" 1487:20250122:160928.200 failed to accept an incoming connection: connection rejected, getpeername() failed: [107] Transport endpoint is not connected 1486:20250122:160928.718 failed to accept an incoming connection: from xxx.xxx.xxx.213: reading first byte from connection failed: [11] Resource temporarily unavailable 1488:20250122:160928.748 failed to accept an incoming connection: from xxx.xxx.xxx.137: reading first byte from connection failed: [11] Resource temporarily unavailable
Additionally, from an agent on a Windows VM on the same cluster;
Code:
2025/01/22 16:10:53.224917 [101] cannot connect to [zabbix.dc.local:10051]: read tcp xxx.xxx.xxx.76:28301->xxx.xxx.xxx.65:10051: i/o timeout 2025/01/22 16:10:53.224917 [101] history upload to [zabbix.dc.local:10051] [host] started to fail 2025/01/22 16:11:08.227014 [101] cannot connect to [zabbix.dc.local:10051]: read tcp xxx.xxx.xxx.76:28330->xxx.xxx.xxx.65:10051: i/o timeout 2025/01/22 16:11:08.228026 [101] sending of heartbeat message for [host] started to fail 2025/01/22 16:11:53.227295 [101] cannot connect to [zabbix.dc.local:10051]: read tcp xxx.xxx.xxx.76:28329->xxx.xxx.xxx.65:10051: i/o timeout 2025/01/22 16:11:53.227295 [101] history upload to [zabbix.dc.local:10051] [host] started to fail 2025/01/22 16:12:08.231388 [101] cannot connect to [zabbix.dc.local:10051]: read tcp xxx.xxx.xxx.76:28364->xxx.xxx.xxx.65:10051: i/o timeout 2025/01/22 16:12:08.232425 [101] active check configuration update from host [host] started to fail
Any one got any troubleshooting ideas? I've searched as hard as I can and I've come up with nothing that's got me close to fixing this. The only thing I've got left to try is running Zabbix on a physical host and see if it's something related to virtualised network on the cluster.
Comment