We have a single Zabbix server at our central site, running 3.4.12 on Ubuntu 18.04. I have a central office and 5 remote offices; each of my remote offices has a router, then behind that a primary switch, then behind that 2 phone systems. I have set up dependencies on my ICMP Ping triggers so, in theory, the phone system doesn't alert when the router goes down at any given site. For example, for the high Ping trigger (at 150ms threshold over a 5 minute average) for the phone system is dependent on the switch AND the router, then the switch is dependent on the router.
What usually happens though is I get alerts for a problem on a dependent device, like the phone system. Then, a few seconds later I get the problem alert for the router. Immediately after that, I get a resolved notification for the dependent device, the phone system. 3 e-mails thus far minimum. Let's say the high-ping is during a file transfer lasting 10 minutes. At the end of the 10 minutes I get the resolved alert from the router, then immediately I get the alerts for all the dependent devices...phone systems, switch, etc. They resolve quickly.
What appears to be happening is that there is a delay in the trigger activation for the dependent devices, so their 5-minute average is above the threshold just after their blocking dependency resolves. This isn't a bug, it seems to be doing exactly what the code defines.
Because the period is defined in the template's trigger definition, you can't easily override the time period with a macro (and I'm not sure that you should). I wish there was a dependency cooldown system for alerts; a dependent alert can't fire for x time after its dependency resolves.
How do other people solve this problem?
What usually happens though is I get alerts for a problem on a dependent device, like the phone system. Then, a few seconds later I get the problem alert for the router. Immediately after that, I get a resolved notification for the dependent device, the phone system. 3 e-mails thus far minimum. Let's say the high-ping is during a file transfer lasting 10 minutes. At the end of the 10 minutes I get the resolved alert from the router, then immediately I get the alerts for all the dependent devices...phone systems, switch, etc. They resolve quickly.
What appears to be happening is that there is a delay in the trigger activation for the dependent devices, so their 5-minute average is above the threshold just after their blocking dependency resolves. This isn't a bug, it seems to be doing exactly what the code defines.
Because the period is defined in the template's trigger definition, you can't easily override the time period with a macro (and I'm not sure that you should). I wish there was a dependency cooldown system for alerts; a dependent alert can't fire for x time after its dependency resolves.
How do other people solve this problem?
Comment