Hello!
I been using Zabbix for the last 6 years to monitor most of our production (public facing) environments. Recently I got moved to a new department which has a separate zabbix server which monitor another bunch of internal systems.
Recently, I was surprised by the fact that in this new team they have eliminated ALL the recovery messages with the simple excuse that they have cut the amount of notification by the half. I personally think that this approach effectively reduces the amount of outgoing notifications but it is also taking away functionality. Also the other reason why they do it that way is to force the on-call to take action every time, which certainly is a great formula to INCREASE the alert fatigue.
I personally find very important to get recovery messages. I mean it is kind of obvious, if I get alerts about a system going into bad condition I would also like to know if the system was able to recover or if someone fixed it, without having to go and look into zabbix and the system every time.
Every organization is different so I would like to hear your opinions about the relevance of recovery messages in your monitoring.
I been using Zabbix for the last 6 years to monitor most of our production (public facing) environments. Recently I got moved to a new department which has a separate zabbix server which monitor another bunch of internal systems.
Recently, I was surprised by the fact that in this new team they have eliminated ALL the recovery messages with the simple excuse that they have cut the amount of notification by the half. I personally think that this approach effectively reduces the amount of outgoing notifications but it is also taking away functionality. Also the other reason why they do it that way is to force the on-call to take action every time, which certainly is a great formula to INCREASE the alert fatigue.
I personally find very important to get recovery messages. I mean it is kind of obvious, if I get alerts about a system going into bad condition I would also like to know if the system was able to recover or if someone fixed it, without having to go and look into zabbix and the system every time.
Every organization is different so I would like to hear your opinions about the relevance of recovery messages in your monitoring.

Comment