Hello,
We have about 6k host devices we monitor in Zabbix (7.4.7). There are all sorts of devices we have to monitor. We don't manage these devices, however I have noticed there are some devices that repeatedly lose ICMP or SNMP access and get missed.
I can see on the graphs that this is happening. I have a simple trigger like this to capture when it happens.
so after 20mins (1 x 1m polls) we get an alert.
What metric would you use to trigger if for example a devices has repeatedly had issues over a 30 day period? I'd call this 'Trend' alerts or something.
What they do is fix the device in question and don't realise its a repeating problem so I'd like to capture this.
Any ideas would be great.
We have about 6k host devices we monitor in Zabbix (7.4.7). There are all sorts of devices we have to monitor. We don't manage these devices, however I have noticed there are some devices that repeatedly lose ICMP or SNMP access and get missed.
I can see on the graphs that this is happening. I have a simple trigger like this to capture when it happens.
Code:
max(/Diagnostics-boards/zabbix[host,snmp,available],#20)=0
What metric would you use to trigger if for example a devices has repeatedly had issues over a 30 day period? I'd call this 'Trend' alerts or something.
What they do is fix the device in question and don't realise its a repeating problem so I'd like to capture this.
Any ideas would be great.