PDA

View Full Version : Zabbix didn't page me when host died!


drose12
20-08-2007, 18:55
I'm running 1.4.2 CVS and the 1.4.1 agent.
This weekend a host of ours died, but I didn't get any pages. Checking the Events show that all my items were returning UNKNOWN, but none of my trigggers are set for that.
Once I brought the host back online, I did get some pages, and with it up and me manually bringing down key services I do get the pages. What gives?

I have the following test scenario right now...I have a Virtual Machine with the zabbix agent on it and being monitored by the zabbix server...if I just go and 'Pause' the VM, which basically just drops it off the face of the earth zabbix does not tell me...I see it in the configuration->hosts Availability that the status is:

Not available Cannot connect to [xxx.yyy:10050] [Interrupted system call]

But my pager is all quiet...

do I need an agent.ping == unknown -> disaster trigger?

drose12
21-08-2007, 20:40
I just upgraded to 1.4.2, and I have the same issue.
Can anyone else reproduce this?

bbrendon
22-08-2007, 01:06
I don't think you found a bug. It sounds like you don't have a trigger set to monitor the availability of the host. I don't use active agents, so I'm not sure how to best get the availability for them. You should find something in the forums.

drose12
23-08-2007, 17:32
I don't think you found a bug. It sounds like you don't have a trigger set to monitor the availability of the host. I don't use active agents, so I'm not sure how to best get the availability for them. You should find something in the forums.

Ok, with a little reading and searching I figured out my problem.

The zabbix_server shows the status as UNKNOWN when it can’t get the value from the agent, but it actually doesn’t store anything in the tables, and our triggers are looking for values in the tables. The solution is pretty simple, yet amazingly not obvious nor a default trigger.

Right now we have an item Ping Agent, which does a agent.ping every 30 seconds…and returns 1 if it is working.
What I needed to do was to create a trigger that says if you get no data or (UNKNOWN) for 2 mins, put up a Disaster page….this is accomplished by :

Host Down {HOSTNAME}
{Unix_t:agent.ping.nodata(120)}

So now if a box just stops responding for 2 minutes straight we’ll get a page.

bbrendon
23-08-2007, 21:16
That'll work. Very nice :)

alj
24-08-2007, 20:23
Ok, with a little reading and searching I figured out my problem.

The zabbix_server shows the status as UNKNOWN when it can’t get the value from the agent, but it actually doesn’t store anything in the tables, and our triggers are looking for values in the tables. The solution is pretty simple, yet amazingly not obvious nor a default trigger.

Right now we have an item Ping Agent, which does a agent.ping every 30 seconds…and returns 1 if it is working.
What I needed to do was to create a trigger that says if you get no data or (UNKNOWN) for 2 mins, put up a Disaster page….this is accomplished by :

Host Down {HOSTNAME}
{Unix_t:agent.ping.nodata(120)}

So now if a box just stops responding for 2 minutes straight we’ll get a page.


Will it create storm of pages when you shut down your server for 3 minutes?

drose12
24-08-2007, 21:53
Will it create storm of pages when you shut down your server for 3 minutes?

I don't think so ...