Ad Widget

Collapse

Alert if process is down for more than 1 minute

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • apalacheno
    Junior Member
    • Jun 2015
    • 14

    #1

    Alert if process is down for more than 1 minute

    Howdy,

    monitoring if a process is running is easy with a trigger like:

    Code:
    {Template_Services:proc.num[ddclient - slee].last(0)}=0
    But how can I tell Zabbix to only fire the trigger if the process is down for more than a minute?

    Use case: Some processes get routinely restarted. That shouldn't fire a trigger each time. Similiarily, if a routine update gets installed, the process is shutdown for a few seconds during the update; that shouldn't lead to a notification either.

    I already tried with this:

    Code:
    {Template_Services:proc.num[ddclient - slee].min(120)}<1
    but this sets off an alarm as soon as the process is down.

    Soo... how can I tell Zabbix to alert only if a process is down for more than 1 minute?

    Cheers,

    Robert
  • akbar415
    Senior Member
    • May 2015
    • 119

    #2
    Originally posted by apalacheno
    Howdy,

    monitoring if a process is running is easy with a trigger like:

    Code:
    {Template_Services:proc.num[ddclient - slee].last(0)}=0
    But how can I tell Zabbix to only fire the trigger if the process is down for more than a minute?

    Use case: Some processes get routinely restarted. That shouldn't fire a trigger each time. Similiarily, if a routine update gets installed, the process is shutdown for a few seconds during the update; that shouldn't lead to a notification either.

    I already tried with this:

    Code:
    {Template_Services:proc.num[ddclient - slee].min(120)}<1
    but this sets off an alarm as soon as the process is down.

    Soo... how can I tell Zabbix to alert only if a process is down for more than 1 minute?

    Cheers,

    Robert

    Code:
    {Template_Services:proc.num[ddclient - slee].last(2m)}=0
    Or with histeresys

    Code:
    ({TRIGGER.VALUE}=0 and
    {Template_Services:proc.num[ddclient - slee].max(2m)}=0)
    or
    ({TRIGGER.VALUE}=1 and
    {Template_Services:proc.num[ddclient - slee].min(1m)}<1)

    Trigger will fired if in the last 2 minutes the value was 0 (no process running), and will remaining until the value was more than 0 for at least 1 min.


    Sorry for the bad englsih
    Last edited by akbar415; 04-08-2015, 21:54.

    Comment

    • apalacheno
      Junior Member
      • Jun 2015
      • 14

      #3
      Hi akbar,

      thank you very much for your answer! It led me directly to the right configuration.

      Checking if a process is down for X minutes can be done by two ways:

      Code:
      {Template_Services:proc.num[ddclient - slee].max(Xm)}<1
      or

      Code:
      {Template_Services:proc.num[ddclient - slee].sum(Xm)}=0
      They both seem equal to me; however for readability I prefer the latter.

      The last(X) function cannot be used, since according to this sheet last() is always equal to last(#1). I can confirm this by own tests.

      Cheers,

      Robert

      Comment

      • apalacheno
        Junior Member
        • Jun 2015
        • 14

        #4
        Just to follow up, I've now settled for the following trigger expression:

        Code:
        {Template_Services:proc.num[ddclient - slee].max(60)}=0
        Result: If the process 'ddclient - slee' is down for more than 60 seconds, the trigger will get fired.

        Comment

        Working...