Hello All,
As the title says, how would one go about monitoring and alerting on packet loss for WAN circuits? I am on Zabbix v5.0.25, and currently using average, but am running into issues of delayed alerts, and delays recoveries. Here is the expression I am currently using: {net.wan.mpls.icmp.tpl:icmppingloss[,5,,,1000].avg(1h)}>30. The way I understand this is every min the proxy send 5 pings to the host and if there is over 30% loss over an hour it triggers.
A few issues I have with this are 1) Average doesn't trigger when an issue occurs because the numbers have not been enough to shift the average above the threshold, 2) Similar to 1, but instead recovery doesn't happen, and it will show an issue when it has been corrected, as the numbers need time to average out and 3) if the circuit goes down completely for any length of time it will trigger, because it looks at the last hour, including the down time.
Is there a better way to do this?
Thanks in advance
As the title says, how would one go about monitoring and alerting on packet loss for WAN circuits? I am on Zabbix v5.0.25, and currently using average, but am running into issues of delayed alerts, and delays recoveries. Here is the expression I am currently using: {net.wan.mpls.icmp.tpl:icmppingloss[,5,,,1000].avg(1h)}>30. The way I understand this is every min the proxy send 5 pings to the host and if there is over 30% loss over an hour it triggers.
A few issues I have with this are 1) Average doesn't trigger when an issue occurs because the numbers have not been enough to shift the average above the threshold, 2) Similar to 1, but instead recovery doesn't happen, and it will show an issue when it has been corrected, as the numbers need time to average out and 3) if the circuit goes down completely for any length of time it will trigger, because it looks at the last hour, including the down time.
Is there a better way to do this?
Thanks in advance