Hello,
I have a Linux server that will occasionally kernel panic, but not take the system completely down. The server will still ping, but will not accept TCP connections (ssh, zabbix agent, etc.). I am trying to design a way to effectively monitor for this situation. I would also like it to work from a template so I don't have to go through about 30 servers to configure each individually.
I have attempted
as a simple check on a test host.
When I stop zabbix_agent on the host, the value sometimes changes to zero, but sometimes doesn't. Checking zabbix_server.log I see the following:
Meaning the host is not being checked during that interval... Therefore no update of my simple check item, and subsequently no trigger.
Thoughts?
Thad
I have a Linux server that will occasionally kernel panic, but not take the system completely down. The server will still ping, but will not accept TCP connections (ssh, zabbix agent, etc.). I am trying to design a way to effectively monitor for this situation. I would also like it to work from a template so I don't have to go through about 30 servers to configure each individually.
I have attempted
Code:
tcp,10050
When I stop zabbix_agent on the host, the value sometimes changes to zero, but sometimes doesn't. Checking zabbix_server.log I see the following:
Code:
4961:20080515:092422 Host [hostname]: another network error, wait for 15 seconds 4961:20080515:092437 Get value from agent failed. Error: Cannot connect to [a.b.c.d:10050] [Connection refused] 4961:20080515:092437 Host [hostname] will be checked after 60 seconds 4961:20080515:092547 Timeout while answering request 4961:20080515:092547 Get value from agent failed. Error: Cannot connect to [a.b.c.d:10050] [Interrupted system call]

Thoughts?
Thad
Comment