Hi all,
I have been successfully using zabbix for monitoring of low level metrics (cpu, mem, disks) but I wonder if zabbix could help me a bit more in my daily job.
I have lots of automated batch processes on unices. My scripts do various things and email me after reaching/not reaching some defined checkpoints. Eg:
1. The service got offline, the backup process was started
2. The backup process reached checkpoint1
3. The backup process reached checkpoint2
4. The backup process is waiting for something
5. Hey, it's 6 am and the backup is still not complete, go take a look, because you have to go online by 7 am.
6. The backup process completed successfully/has failed because of sth.
The problem is my inbox gets filled of automated (but very important) reports, and the method I am using now is inefficient (tons of email).
I've been thinking of something like this:
- for each host I have in zabbix I'll make a trapper item called HostLog
- every batch script will log to the item when it thinks it has something to say (instead of sending me an email message); the script will include special tags in the message which when found will fire a trigger and i get email notification.
- I want to have a generic solution (templating), so triggers won't be watching for strings like 'hey, the backup XYZ failed', but rather tags like '[Disaster]' on the HostLog item.
I've been struggling to apply such approach in my setup, but there are many non-obvious troubles, and solutions generating other troubles. I don't want to discuss all the details in the first place.
The question is - has anyone successfully implemented a model of monitoring batch processes (just like the example above) for a big environment with zabbix, and if yes - how? Maybe someone can share his experiences in the topic?
I have been successfully using zabbix for monitoring of low level metrics (cpu, mem, disks) but I wonder if zabbix could help me a bit more in my daily job.
I have lots of automated batch processes on unices. My scripts do various things and email me after reaching/not reaching some defined checkpoints. Eg:
1. The service got offline, the backup process was started
2. The backup process reached checkpoint1
3. The backup process reached checkpoint2
4. The backup process is waiting for something
5. Hey, it's 6 am and the backup is still not complete, go take a look, because you have to go online by 7 am.
6. The backup process completed successfully/has failed because of sth.
The problem is my inbox gets filled of automated (but very important) reports, and the method I am using now is inefficient (tons of email).
I've been thinking of something like this:
- for each host I have in zabbix I'll make a trapper item called HostLog
- every batch script will log to the item when it thinks it has something to say (instead of sending me an email message); the script will include special tags in the message which when found will fire a trigger and i get email notification.
- I want to have a generic solution (templating), so triggers won't be watching for strings like 'hey, the backup XYZ failed', but rather tags like '[Disaster]' on the HostLog item.
I've been struggling to apply such approach in my setup, but there are many non-obvious troubles, and solutions generating other troubles. I don't want to discuss all the details in the first place.
The question is - has anyone successfully implemented a model of monitoring batch processes (just like the example above) for a big environment with zabbix, and if yes - how? Maybe someone can share his experiences in the topic?
Comment