I'm not sure how to formulate my exact question, so let me give you some details to help you understand my environment and what I'm trying to get done.
I have about 4000 devices spread out over 120 locations. Each location has a 6-char prefix. Each location has a 'Gateway' with a trigger that will fire after 15min of failed pings. All other devices have a 25min ping trigger.
All devices and items have a 'Importance' tags. Based on those we will get alerts via email, teams, tickets or even paging.
If a 'Gateway' fails we do not need to be alerted. That event will be picked up by the network team, or will be of no critical impact to that location.
If for example only a virtual host or a storage system fails, we need to know.
So how to 'mute' when we see the gateway failing?
I could setup trigger dependencies per device per trigger, but that is just too much. 4000 devices over 120 locations. Devices being replaced, added, decommissioned almost all of the time, I need something configurable. Preferably based on Tags (or something similar) as I monitor the quality of the tag data via exports and reports that collect data from several systems (AD, DNS, SCCM, SEP and more)
Maybe something like:
Every device gets a calculated item that is the ping of the gateway at that location. The trigger for the device will need to include logic that also looks at that calculated item.
But I wouldn't want that. I would still want to see the problems per device, maybe even the 'low level' alerts as via teams and email. Just not the 'high level' alerts through tickets and paging.
Ideas to steer me in the right direction?
I have about 4000 devices spread out over 120 locations. Each location has a 6-char prefix. Each location has a 'Gateway' with a trigger that will fire after 15min of failed pings. All other devices have a 25min ping trigger.
All devices and items have a 'Importance' tags. Based on those we will get alerts via email, teams, tickets or even paging.
If a 'Gateway' fails we do not need to be alerted. That event will be picked up by the network team, or will be of no critical impact to that location.
If for example only a virtual host or a storage system fails, we need to know.
So how to 'mute' when we see the gateway failing?
I could setup trigger dependencies per device per trigger, but that is just too much. 4000 devices over 120 locations. Devices being replaced, added, decommissioned almost all of the time, I need something configurable. Preferably based on Tags (or something similar) as I monitor the quality of the tag data via exports and reports that collect data from several systems (AD, DNS, SCCM, SEP and more)
Maybe something like:
Every device gets a calculated item that is the ping of the gateway at that location. The trigger for the device will need to include logic that also looks at that calculated item.
But I wouldn't want that. I would still want to see the problems per device, maybe even the 'low level' alerts as via teams and email. Just not the 'high level' alerts through tickets and paging.
Ideas to steer me in the right direction?
Comment