Trying to monitor a system the occassionally half crashes.

scalft

Junior Member

Joined: Apr 2008

Posts: 12
#1

Trying to monitor a system the occassionally half crashes.

15-05-2008, 17:19

Hello,

I have a Linux server that will occasionally kernel panic, but not take the system completely down. The server will still ping, but will not accept TCP connections (ssh, zabbix agent, etc.). I am trying to design a way to effectively monitor for this situation. I would also like it to work from a template so I don't have to go through about 30 servers to configure each individually.

I have attempted

Code:

tcp,10050

as a simple check on a test host.

When I stop zabbix_agent on the host, the value sometimes changes to zero, but sometimes doesn't. Checking zabbix_server.log I see the following:

Code:

4961:20080515:092422 Host [hostname]: another network error, wait for 15 seconds 4961:20080515:092437 Get value from agent failed. Error: Cannot connect to [a.b.c.d:10050] [Connection refused] 4961:20080515:092437 Host [hostname] will be checked after 60 seconds 4961:20080515:092547 Timeout while answering request 4961:20080515:092547 Get value from agent failed. Error: Cannot connect to [a.b.c.d:10050] [Interrupted system call]

Meaning the host is not being checked during that interval... Therefore no update of my simple check item, and subsequently no trigger.

Thoughts?

Thad
Tags: None
nelsonab

Senior Member

Joined: Sep 2006

Posts: 1233
#2

15-05-2008, 17:50

you could trigger on nodata key[item].nodata(time)

RHCE, author of zbxapi
Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM
Comment

Ad Widget

Trying to monitor a system the occassionally half crashes.

Trying to monitor a system the occassionally half crashes.

Comment