Hello there,
I currently manage an environment equipped with 4 Zabbix proxies, each functioning as a virtualized machine with good hardware capabilities. Regrettably, one of these proxies consistently encounters issues, often resulting in complete crashes. This forces us to perform manual server restarts in order to restore functionality.
In my pursuit of resolving this problem, I have examined the logs. However, these logs are predominantly filled with entries related to active checks (I know is due to a disparity between hostnames in the agent configuration files and those presented on the frontend) and frequent reconnection attempts for items utilizing system.run keys to execute commands on external servers. Despite my efforts to monitor the logs leading up to these crashes, I've been unsuccessful in discovering any pertinent information.
I've also checked the 'Required Performance (vps)' metrics for the proxies. The proxy that experiences crashes reports a workload of 277 hosts, 28580 items, and 197.88 vps. But my other proxy B handles a workload of 546 hosts, 65247 items, and 414.67 vps remains stable.
About the queue, I have some queue for this crashing proxy, something around 10 items about more than 10 minutes, but the proxy B handles with a higher queue, about 15 items more than 10 minutes.
About the hardware, the CPU utilization of the problematic proxy seldom exceeds 40%, and memory consumption scarcely surpasses 15%. All proxies have been tuned to optimize their performance.
Any information about what I could do next? Your insights would be greatly appreciated
I currently manage an environment equipped with 4 Zabbix proxies, each functioning as a virtualized machine with good hardware capabilities. Regrettably, one of these proxies consistently encounters issues, often resulting in complete crashes. This forces us to perform manual server restarts in order to restore functionality.
In my pursuit of resolving this problem, I have examined the logs. However, these logs are predominantly filled with entries related to active checks (I know is due to a disparity between hostnames in the agent configuration files and those presented on the frontend) and frequent reconnection attempts for items utilizing system.run keys to execute commands on external servers. Despite my efforts to monitor the logs leading up to these crashes, I've been unsuccessful in discovering any pertinent information.
I've also checked the 'Required Performance (vps)' metrics for the proxies. The proxy that experiences crashes reports a workload of 277 hosts, 28580 items, and 197.88 vps. But my other proxy B handles a workload of 546 hosts, 65247 items, and 414.67 vps remains stable.
About the queue, I have some queue for this crashing proxy, something around 10 items about more than 10 minutes, but the proxy B handles with a higher queue, about 15 items more than 10 minutes.
About the hardware, the CPU utilization of the problematic proxy seldom exceeds 40%, and memory consumption scarcely surpasses 15%. All proxies have been tuned to optimize their performance.
Any information about what I could do next? Your insights would be greatly appreciated