With many monitoring metrics alerts become somewhat noisy especially for metrics which tend to roam near alert threshold or just simply flap.
So in order to decrease noise there are 2 things could be very handy:
1. Configurable number of trigger failures to generate alert. (for nagios this is usually 3 by default). Then in order for alert to recover trigger should have the same number of consecutive successes. This shouldt be too hard to implement but its a killer feature which will make zabbix in overall much better monitoring system instantly.
3. Trigger flap detection. If certain trigger had X alert during Y time - mark it as "FLAPPING" and do not generate any alert till Z time passes of trigger success state.
In order to avoid many additional metrics per trigger one can generate configurable trigger profile and assign it to triggers. (or configure metrics directly for every trigger - no big deal but less elegant).
Another cool feature would be template autodetection.
I e you have trigger template with certain metrics, alerts, graphs. You also have special "detection" condition, if that condition executes as true - assign certain template to the host automatically, create graphs, triggers etc.
Another cool feature would be wildcard graphs.
I e stacked graph which has single metric assigned for glob or regexp of hosts with different colors chosen automatically.
Stacked graphs right now represent big management overhead, every time i add node to the cluster i have to go and update many stacked graphs.
I use graphite just for this feature cause it lets me to select wildcards for host names. But there is no reason why it cant be like that in zabbix as well.
Thanks!
So in order to decrease noise there are 2 things could be very handy:
1. Configurable number of trigger failures to generate alert. (for nagios this is usually 3 by default). Then in order for alert to recover trigger should have the same number of consecutive successes. This shouldt be too hard to implement but its a killer feature which will make zabbix in overall much better monitoring system instantly.
3. Trigger flap detection. If certain trigger had X alert during Y time - mark it as "FLAPPING" and do not generate any alert till Z time passes of trigger success state.
In order to avoid many additional metrics per trigger one can generate configurable trigger profile and assign it to triggers. (or configure metrics directly for every trigger - no big deal but less elegant).
Another cool feature would be template autodetection.
I e you have trigger template with certain metrics, alerts, graphs. You also have special "detection" condition, if that condition executes as true - assign certain template to the host automatically, create graphs, triggers etc.
Another cool feature would be wildcard graphs.
I e stacked graph which has single metric assigned for glob or regexp of hosts with different colors chosen automatically.
Stacked graphs right now represent big management overhead, every time i add node to the cluster i have to go and update many stacked graphs.
I use graphite just for this feature cause it lets me to select wildcards for host names. But there is no reason why it cant be like that in zabbix as well.
Thanks!