Apologies if this has been asked before, I am looking into setting up alerting with Zabbix 6.0 LTS.
I will provide some background. Our deployment will be a very flat topology deployment. Where the vast majority of our the monitored endpoints, will be one or two services at a site, at many sites over a large geographical area.
What I am trying to do is to determine how to do the following.
If we have a small number of sites or hosts go down, we would like to obviously get those alerts either emailed to another system or the like.
If we start to hit a threshold of say 15 or 20 hosts, we would like to supress the 20 seperate alerts and get a lets call it parent alert to say You have a significant outage.
I have had a look at triggers and trigger dependencies, and while this seems to work well for devices at the one geographic location, where you have say 50 seperate devices, and the Parent router is offline. In that case you get one alert instead of 50.
In or case there is no easy to think of common parent.
So in short I am currently looking at :
I have had a look at services and I will be investigating that further. But I would just like to ask how have people supressed alerts or catered for mass outages, when the topology of what you are monitoring is very flat.
I will provide some background. Our deployment will be a very flat topology deployment. Where the vast majority of our the monitored endpoints, will be one or two services at a site, at many sites over a large geographical area.
What I am trying to do is to determine how to do the following.
If we have a small number of sites or hosts go down, we would like to obviously get those alerts either emailed to another system or the like.
If we start to hit a threshold of say 15 or 20 hosts, we would like to supress the 20 seperate alerts and get a lets call it parent alert to say You have a significant outage.
I have had a look at triggers and trigger dependencies, and while this seems to work well for devices at the one geographic location, where you have say 50 seperate devices, and the Parent router is offline. In that case you get one alert instead of 50.
In or case there is no easy to think of common parent.
So in short I am currently looking at :
- Event Correlation
- Services
- Tags and Problem Tags
- Trigger Dependencies
I have had a look at services and I will be investigating that further. But I would just like to ask how have people supressed alerts or catered for mass outages, when the topology of what you are monitoring is very flat.
Comment