Ad Widget

Collapse

Using nodata to delay action

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cstackpole
    Senior Member
    Zabbix Certified Specialist
    • Oct 2006
    • 225

    #1

    Using nodata to delay action

    Hello,
    I have a program running on a system that I am trying to monitor. It is a critical program and should not be down for more then 10 minutes. So I set up a trigger:
    {host: proc.num[program].last(0)}<1)

    This works fine; when the program turns off I get an email saying its down and when the program is on I get an email saying everything is ok again. However, the first night my inbox flooded with emails. Turns out that if a certain action occurs this program resets itself. On every reset, I got an email saying the program was down and then another email a minute later saying it was back up. A little research tells me that if the program is down for more then 20-30 seconds then chances are real high that it isn't coming back up on its own.

    So I dug through the forums for a couple of hours, found a couple of good ideas, and I changed my trigger accordingly:
    ({host: proc.num[program].last(0)}<1)&({host: proc.num[program].nodata(30)})

    Now I get no emails at all regardless of the state. Can someone please help me figure out what I have done wrong?

    One thing to note: I am running mostly Debian boxes with a few CentOS systems and I use the zabbix that happens to be in the repository for that system. We have a mixture of Debian Etch and Debian Lenny and CentOS 4.4. Therefore I have a mixture of 1.1 (zabbix agents) and 1.4 (zabbix server and a few agents). From my observation this problem seems to occur regardless of the verison though.

    Thanks, I appreciate it!

    [update edit] I try to test out updates before I apply them. I forgot that I had run updates to zabbix on my test systems. Debian Lenny (testing) is at 1.1.7-1 and Debian Etch (stable) is at 1.1.4-10. The CentOS machine is still at 1.1. I do not know if this helps at all, but I just wanted to clarify because I know that there are a lot more supported features in the newer versions and I have a crazy mixture.
    Last edited by cstackpole; 15-06-2007, 16:03.
  • JonB
    Member
    • Oct 2006
    • 63

    #2
    Why not use

    {host: proc.num[program].max(600)}<1)

    That is, if the trigger result is less than 1 for 600 sec then the trigger is true.

    Comment

    • cstackpole
      Senior Member
      Zabbix Certified Specialist
      • Oct 2006
      • 225

      #3
      Thank you very much! That works exactly the way I need it to!

      I guess I was trying to make things more complicated then need be. Thanks for simplifying it for me!

      Comment

      Working...