Ad Widget

Collapse

proc.num usage

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • krcourser
    Junior Member
    • Oct 2009
    • 8

    #1

    proc.num usage

    I would like to know how I can monitor a process to only notify me if the process is down for more than a minute. We have processes that have an automatic restart when some condition is reached and we would like our watches to only notifiy us if the process doesn't come back up within 1 - 5 minutes depending on the process. I found the proc.num[].min( ) but that is used to monitor how long a process is running. Would I put
    proc.num[my_process].min(-5)<0
    Any help would be appreciated.
  • Hans dmacron0
    Junior Member
    • Aug 2010
    • 10

    #2
    Hi,

    i'm just a beginner so my answer might not be the best one.

    But if you know how to monitor of a process is running at all I think you should enable escalation in the action and set a period of 60 seconds. Than in the action operations you can configure when to sendmessages (or execute remote actions) by setting the steps fields.

    So if you want to send only one email after 5 minutes down time set the step from field to 5 and the steps to field to 5. (it might be you have to set both to field to 6 to get to an alert message in 5 minutes since I am not sure when the count starts at 0 or 1)

    The cool part is that you can configure remote actions to do things at different escalation steps.

    Anyway this setup worked for me.

    I needed zabbix to send me a message when the load of mysql is high for one hour, but I needed zabbix to restart mysql after 45 minutes. So I made an escaltion period of 300 seconds. That gives me 2 (or 3 I am still not sure) attempts at a restart before I get a message that the mysql server is in trouble.

    I hope this helps a bit.

    Comment

    • krcourser
      Junior Member
      • Oct 2009
      • 8

      #3
      Monitoring processes

      I got my Trigger to work by using the following:
      {my_serverroc.num[my_process.exe].last(0)}<1 & {my_serverroc.num[my_process.exe].max(180)}<1
      It checks to make sure the process is running and if it goes down it then has to wait for 3 minutes then check if it is still down and if so send an email otherwise ignore. Thanks for your help and maybe this might help you in the future.

      Comment

      Working...