Ad Widget

Collapse

Ideas for process monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • hertell
    Member
    • Aug 2010
    • 31

    #1

    Ideas for process monitoring

    Hi,

    I would like to ask you guys for ideas and advice if i have done this correct, and if there could be something that i could improve.

    I describe shortly what i have:

    I monitor a process that occasionally crashes. For this i have created a trigger that get's fired if the process goes down. Sometimes i need to stop manually this process, and in this case i don't want it to be restarted. For this case i have modified the init-script to create a file (manual.stop) when the process is shut down properly.

    For this setup I have two items:
    1) Process running (checks if process is running with proc.num)
    2) manual.stop status (checks with vfs.file.exists if a file named manual.stop exists)

    To fire the action to start my process, i have one trigger with two conditions:
    1) Process is down
    AND
    2) manual.stop file does NOT exist

    I have also a second trigger that informs me if the process has been shut down manually:
    1) Process is down
    AND
    2) manuals.stop file DOES exists

    The problem i usually get are false alarms when the process is shut down manually. I would not want to get any info if the process is shut down correctly, but if the process has crashed, then i want a notification about this..

    I wonder if i need to use trigger dependencies. I'm not really familiar in what kind of cases they should be used.

    So any help/ideas or feedback on my setup are welcome :-)
  • qix
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2006
    • 423

    #2
    Could you post the exact trigger definitions?
    Also it would be handy to know the intervals that you setup on your items.
    You might just run into a timing issue where the lack of the right process is noticed but the existence of the manual.stop file is not yet noticed.

    Another idea would be to use the Zabbix trapper in your init scripts to notify the zabbix server of correct startup and manual shutdown of your process.

    A startup script would do the following:

    Code:
    start proces
    sleep for while
    send zabbix trap (item process.startstate, value is 1)
    The stop script would then be:

    Code:
    send zabbix trap (item process.startstate, value is 0)
    sleep for a while
    stop the process
    If you then use a trigger that evaluates the last number of processes being more than 0 with the last process.startstate being 1, you should be alright. (Zabbix is actively notified prior to stopping the process manually)
    With kind regards,

    Raymond

    Comment

    • hertell
      Member
      • Aug 2010
      • 31

      #3
      Hi,

      Here is the trigger-definition for checking if the process has been shut down manually:
      {MyHost:vfs.file.exists[/path/to/manual.stop].last(0)}=1 & {MyHostroc.num[,,,my_process_name].last(0)}=0

      And here again the trigger if the process has crashed:
      {MyHost:vfs.file.exists[/path/to/manual.stop].last(0)}=0 & {MyHostroc.num[,,,my_process_name].last(0)}<1

      I have named the crash-trigger as my_process_name. This way i can use that name in the action-script as
      {HOSTNAME}:sudo /etc/init.d/{TRIGGER.NAME} start

      The idea about using zabbix_sender was great! This would definitively limit the false-alarms to zero.. :-) It would also limit the calls to check if the manual.stop file exist. Also file-permissions etc things would be avoided :-)

      I'm using idea of monitoring files for other purposes too. A couple of users needed shell-access to our servers, and to avoid them fooling around, i put them into a root-jail. For letting them start/stop/restart their process, they just create a START, STOP or RESTART file in their homedir. Zabbix then handles the rest.
      Your idea about firing trigger with zabbix_sender makes things much easier :-) It's just that can i rely on that that zabbix_sender actually sends the trigger-value to the zabbix-server? What if i happen to restart the zabbix-server in the same time as the sender tries to send such message..?
      Last edited by hertell; 28-12-2010, 14:27.

      Comment

      • qix
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2006
        • 423

        #4
        Unfortunately, the signal from your script to the zabbix server would be missed.
        But, the zabbix trapper does state if the sending succeeded or not.

        With some extra work, you could make your script try sending until it is a success (while loop?).

        Good luck!
        With kind regards,

        Raymond

        Comment

        • hertell
          Member
          • Aug 2010
          • 31

          #5
          Hi Raymond,

          I just tested the sender with a wrong port to zabbix. It looks that the sender needs a response from the server before it can exit in any way, so i guess i can rely on that the sender really get's the message sent to the server..

          Comment

          • qix
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Oct 2006
            • 423

            #6
            I wouldn't be to sure of that, you might hate youself in the morning

            Maybe you have seen that the sender will give a last line that states how many items were sent, succeeded and failed.

            I would parse those values in a script just to be sure
            With kind regards,

            Raymond

            Comment

            • hertell
              Member
              • Aug 2010
              • 31

              #7
              Yep, already done that :-)

              Comment

              Working...