Ad Widget

Collapse

batch process monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • hannibal20
    Junior Member
    • Jan 2007
    • 22

    #1

    batch process monitoring

    Hi all,

    I have been successfully using zabbix for monitoring of low level metrics (cpu, mem, disks) but I wonder if zabbix could help me a bit more in my daily job.

    I have lots of automated batch processes on unices. My scripts do various things and email me after reaching/not reaching some defined checkpoints. Eg:
    1. The service got offline, the backup process was started
    2. The backup process reached checkpoint1
    3. The backup process reached checkpoint2
    4. The backup process is waiting for something
    5. Hey, it's 6 am and the backup is still not complete, go take a look, because you have to go online by 7 am.
    6. The backup process completed successfully/has failed because of sth.

    The problem is my inbox gets filled of automated (but very important) reports, and the method I am using now is inefficient (tons of email).

    I've been thinking of something like this:
    - for each host I have in zabbix I'll make a trapper item called HostLog
    - every batch script will log to the item when it thinks it has something to say (instead of sending me an email message); the script will include special tags in the message which when found will fire a trigger and i get email notification.
    - I want to have a generic solution (templating), so triggers won't be watching for strings like 'hey, the backup XYZ failed', but rather tags like '[Disaster]' on the HostLog item.

    I've been struggling to apply such approach in my setup, but there are many non-obvious troubles, and solutions generating other troubles. I don't want to discuss all the details in the first place.

    The question is - has anyone successfully implemented a model of monitoring batch processes (just like the example above) for a big environment with zabbix, and if yes - how? Maybe someone can share his experiences in the topic?
  • simonc
    Member
    • Jul 2009
    • 73

    #2
    Your solution seems perfect for what you want.
    Personnaly I use zabbix_sender in my scripts and I must say it's really nice.
    As soon as there is a problem with one of my script, Zabbix warns me very well

    Comment

    • hannibal20
      Junior Member
      • Jan 2007
      • 22

      #3
      Some of the problems here:

      - When one script fires a "warning" message, messages from another scripts which should fire up the same trigger are ignored. I have redefined the trigger to 'multiple true events' and it's almost ok, besides that I'm not notified of all the events that have occurred (internal zabbix alerting loop has a frequency of 1 run per 30s - checked against the source, a trigger that switches off -> on -> off within that time does not generate action). It just lacks the reliability I demand.

      - I want to be notified that a process has not been started. How can I approach that?

      Any suggestions?

      Comment

      • zabbix_zen
        Senior Member
        • Jul 2009
        • 426

        #4
        Hi hannibal.

        Why don't you monitor each one of those script results separately instead of monitoring them as a whole?
        When one script fires a "warning" message, messages from another scripts which should fire up the same trigger are ignored
        Why not creating separate Triggers that no only check the severity tag you've added but also ( & ) the process/host/ name or name?

        Comment

        Working...