View Full Version : Availability trigger for Active Checks
Hello!
I have all my hosts running with active checks.
Problem is, with active checks the item 'status' is not supported. And pinging the host is not an option either.
So do you poeple have any idea how to monitor server availability with active checks?
I've tried adding a trigger with the 'nodata' expression but the trigger stop evaluating as soon as the data stops coming in so it simply switches to UNKNOWN instead of "going off".
All help is highly appreciated!
regards,
Simon
James Wells
11-08-2005, 16:34
Greetings,
When dealing with active checks only, you create various triggers that alert when values don't change, when they should or when there are no updates, when there should be. For example, assume you wanted to alert if an active check server failed to send in it's data within a specified time frame.
First, you would create a user defined check, say;
UserParameter=active[timestamp],/bin/date +%s
Now, the returned value will change every second, so if you set your delay to say 30, then this value should never be the same across two passes. From there, you would create a trigger similar to the following;
({_Template_Linux_SVR:active[timestamp].last(0)}={_Template_Linux_SVR:active[timestamp].prev(0)})|{_Template_Linux_SVR:active[timestamp].nodata(35)}
What this means is that if the current returned value is the same as the previously returned value, then something is wrong. Additionally, if there is no recent data, it will alert as well. You want to be careful here that your active check periodicity is shorter than your server check or else you will get a lot of false positives.
I thank you very much for your suggestion!
my big problem was that active checks wasn't evaluated when zabbix wasn't receiving any data. So no matter how i checked it, it would never trigger.
But in the latest alpha12 this has changed.
I now use a nodata(300) check on a cpu item. Then i do not need to add anything to the agents conf-file and since all hosts has a cpu the check will always be useable, regardless of platform.
If anybody has come up with better ways to check availability of hosts that only utilize active checks, please share you're methods!
James Wells
12-08-2005, 22:41
I now use a nodata(300) check on a cpu item. Then i do not need to add anything to the agents conf-file and since all hosts has a cpu the check will always be useable, regardless of platform.
LOL!!! Tell that to my zabbix_agentd. For some strange reason none of my systems are returning CPU data... All of them return ZBX_NOTSUPPORTED. LOL!!! I am beginning to get a complex.
hehe well, on linux cpu_util is not supported. Maybe you are using the wrong item?
bbrendon
31-07-2007, 05:03
Can we merge this thread with the following thread?
http://www.zabbix.com/forum/showthread.php?t=2425
Combination of agent.ping and nodata() should work perfectly for all ZABBIX agents regardless of underlying platform.
bbrendon
31-07-2007, 07:36
The problem I have is that if there are a lot of events being grabbed from the windows eventlog, the agent doesn't do ANYTHING until its done downloaded eventlogs. This could take 5, 10, 15 or 60 minutes. During that time, the server gets marked as DOWN in zabbix. [This happens with other items as well, but mostly the eventlog]
Is there a solution to this?
It would be GREAT if i could determine the availability of all the enabled items relating to a server. Is this somehow possible? I can't think of a way
Alexei- I don't see how that solves the above described problem. Combining nodata with agent.ping seems to do the same as combining nodata with the agent grabbing an item value. Is there a way to do something like server.<all items>.nodata?
Alexei- I don't see how that solves the above described problem. Combining nodata with agent.ping seems to do the same as combining nodata with the agent grabbing an item value. Is there a way to do something like server.<all items>.nodata?
Why? You have items refreshed every 5 seconds, some items are refreshed every hour. How would you define the nodata() for ALL items?
My suggestion is very simple: use agent.ping on all systems with a reasonable refresh rate (say, 30 seconds); and define a trigger wich would fire alerts if there is no data coming from the agent.ping within 2 minutes.
Simple and efficient.
bbrendon
31-07-2007, 07:45
I completely agree except what if there are MANY MANY events in the eventlog?
Zabbix will spend forever and ever trying to download the events and will do nothing else except download events. By that time, it has been 30 mins and the agent.ping item hasn't ran and thus the server generates a false positive action of being down.
I completely agree except what if there are MANY MANY events in the eventlog?
Zabbix will spend forever and ever trying to download the events and will do nothing else except download events. By that time, it has been 30 mins and the agent.ping item hasn't ran and thus the server generates a false positive action of being down.
What events are you talking about?! Calculation of nodata() is very efficient and it does not use events!
bbrendon
31-07-2007, 19:41
When using a windows agent, there is an eventlog key (e.g. eventlog[Application]) That collects the eventlogs from the windows server and dumps them into the zabbix database.
Utilizing this feature will cause to zabbix agent to appear to hang. Does that make it clear?
On another note...
I was actually hoping other people would chime in with their experience. I have also seen in the past where agents reporting to the zabbix server from the internet will sometimes not report on the item I have assigned to associate with nodata for determining the host availability. Other items do populate, but one or two don't for a short time. This is strange behavior, which I've never quite understood, but would be eliminated if there was a way to do nodata for all items associated with a host.
Solutions that would solve all mentioned quirks:
Don't monitor hosts over the internet & don't use eventlog key - not happening
Write a daemon (perl?) that creates/simulates a nodata function that applies to all items on a host. This daemon would run at the database level.
Alexei gets creative :)
Other suggestions...
Does anyone else experience this stuff?!?!
Alexei gets creative :)
I seems to be very creative when it comes to avoiding fixing stuff :D
Seriously if the agent hangs for whatever reason, the nodata() function will tell about this almost immediately. Calculation of nodata() related triggers does not depend on availability of agent.
bbrendon
01-08-2007, 03:09
Seriously if the agent hangs for whatever reason, the nodata() function will tell about this almost immediately. Calculation of nodata() related triggers does not depend on availability of agent.
The agent doesn't actually hang. It just spends all its time downloading eventlogs (appearing to hang at first) and the other items for the host don't get data causing nodata to trigger! does that make sense?
Do we have ideas on a great solution?
The agent doesn't actually hang. It just spends all its time downloading eventlogs (appearing to hang at first) and the other items for the host don't get data causing nodata to trigger! does that make sense?
Do we have ideas on a great solution?
I would suggest opening a new thread to report and discuss this issue.
bbrendon
02-08-2007, 03:11
I'll leave it for the moment no one seems to care but me.
Hopefully you can keep this in the back of your head and work something into future stuff.
So sum up, a nodata function that works against all keys/items associated to an agent/host would be my preferred solution. It would solve this (eventlog key), other issues i mentioned, and probably future issues.
So sum up, a nodata function that works against all keys/items associated to an agent/host would be my preferred solution. It would solve this (eventlog key), other issues i mentioned, and probably future issues.
My point is different. Instead of introducing a workaround (yes, I think that the new nodata() function is the workaround in this context), I would suggest fixing of the original issue.
Unfortunately you did not provide enough details about the eventlog issue to start doing any serious work.
bbrendon
03-08-2007, 03:35
My point is different. Instead of introducing a workaround (yes, I think that the new nodata() function is the workaround in this context), I would suggest fixing of the original issue.
Unfortunately you did not provide enough details about the eventlog issue to start doing any serious work.
Well, I think both would be useful. Partially because agent communication and the busyness of a server seem to cause sporatic items being collected sometimes.
I'm not sure what detail to give you regarding the eventlogs. There aren't any errors. Its easy to reproduce. I've seen it on a dozen servers. Just put 10,000 events in the log, start the agent. And you'll reproduce it.
bbrendon
08-08-2007, 01:20
I was just thinking ...
A recent thread regarding all the connections made between an agent and a server. Apparently there is one connection made per item.
If this is the case, a busy internet connection with LOTS of stuff behind it may have issues making so many connections to a zabbix server on the internet. The experience I have on occasion which is random items not updating at random times may be the result of a busy internet connection coupled with agents creating one connection per item.
Just a thought...
evgeny elkin
10-08-2007, 08:57
Cause function nodata() is very useful, buh how i ditinguish 2 state of the host:
1) zabbix_agent daemon is stopped
2) host powered off
?
Cause function nodata() is very useful, buh how i ditinguish 2 state of the host:
1) zabbix_agent daemon is stopped
2) host powered off
?
1. No data from active agent.ping
2. No data from active agent.ping AND no ICMP ping AND no some simple passive TCP ping
Andreas Bollhalder
22-08-2007, 13:27
Hello all
I have too a trigger using the nodata() function:{_template:agent.ping.nodata(240)}=1Today , I have upgraded to ZABBIX to version 1.4.2. After restarting the ZABBIX server, I got for all hosts implementing the trigger an OFF and then an ON message. So about 200 emails. Lucky, that I had renamed the script for sending SMS :rolleyes:
It's clear to me, that the ZABBIX server has been down for more then 240s. Because of this, no data has been colleted within the last 240s and it has sent the messages.
Now, how I should extend the trigger to prevent this ? Using "system.uptime" from the ZABBIX server doesn't help when upgrading without restarting the whole server.
Any ideas ?
Andreas
bbrendon
22-08-2007, 18:41
Hm... I have this problem as well. I always have to disable all actions before restarting my zabbix server.
A few good ideas...
1. Add a mysql statement to "zabbix-server start" that disables actions BEFORE starting the zabbix-server binairies.
2. Monitor zabbix_server process start time. If its < 10 minutes, don't alert. The problem you might have here, is that the data may not get into the zabbix server fast enough to stop all of the alerts. The trigger might see data in the database, but it might be too old.
3. Alexei adds a "stabilization" paramater to zabbix_server.conf which allows you to enter the number of seconds to wait before actions start working.
I can think of a few possibly complex usages of nodata and zabbix_sender that might work as well, but now the trigger expressions are getting way too long in my head so I'm nixing the idea. KISS. :)
This must be considered as a bug! It will be fixed.
Andreas Bollhalder
23-08-2007, 07:59
Hallo Alexei
Will awaiting the solution. Thanks in advance.
Andreas
Thank you! This is fixed.
You may try the latest code from www.zabbix.com/developers.php (../developers.php)
Andreas Bollhalder
04-09-2007, 09:59
Hello SashaThank you! This is fixed.Great news. Thank you for the effort.
Andreas
michaeltje
08-11-2007, 09:38
Thank you! This is fixed.
You may try the latest code from www.zabbix.com/developers.php (../developers.php)
im experiencing the same problem with this, im currently using version 1.4.3 latest code from 15 oct. Hope you have more information for me
I think my question belongs to this topic. Using nodata I monitor a host to check if it's down. Every night however we create a copy of our MySQL database which locks the database for a while. During this time the data cannot be stored in the database (since it's locked) and the triggers are triggered. Is there a way to avoid this?
Andreas Bollhalder
09-11-2007, 11:04
Hello dreas
Either you setup the timeout for the trigger higher then the DB is locked (not that good solution) or have to use InnoDB with the Hotcopy utility to do a backup without locking (which is expensive).
Andreas
Hi. Thanks for the response! Both are indeed not too satisfactory ;) I guess I could add a trigger monitoring the write ability of the database and adding a dependency. However one can argue that nodata should not be triggered when there is a problem storing the data locally (cause there IS data .. it's just not stored).
Andreas Bollhalder
09-11-2007, 11:18
To get the locked state of the DB and depend on it would be a good idea. Unfortunatly, I don't have a solution for this.
Andreas
Andreas Bollhalder
09-11-2007, 14:12
I think the problem is the following:
The backup starts and the DB gets locked. Then a trapper would receive from the agent that the DB is locked, but can't update the DB, because it's locked. Therefore, the trigger for DB is locked would never comes to true and your dependencies on this trigger never works.
A possible solution could be to signal ZABBIX, before that the DB would be get locked. In a backup script, create for example a lock file. Then wait more time then the trigger of this lock file needs to get true. Make the backup and delete the lock file.
Andreas
Hmm true. I hadn't thought of that. How could I easily manage to trigger a trigger from my dump script before I start the dump?
Andreas Bollhalder
09-11-2007, 16:28
You also can setup an item for use with zabbix_sender. Use the zabbix_sender in the script to send a value which set the trigger to the needed state. I would still wait a short amount of time after using zabbix_sender to allow ZABBIX to process the value and change the trigger state.
Andreas
It seems the trigger state is instantly change. Will try this. Thanks for the suggestion!
bbrendon
09-11-2007, 20:36
I think we should stop using this thread. Its a cluster F*** of 4 different topics, which are not very related.
Example of agent.ping with nodata function:
http://www.zabbix.com/forum/showthread.php?p=59540