Hello all,
I have configured Zabbix to check on SMTP TCP service for instance, and we get service flap sometimes where SMTP simply doesn't respond sometimes (not sure why, network issue possibly, but very intermittent).
But a telnet to 25 two seconds later, and the service responds.
However, we changed to requiring two failed SMTP connection attepts over 120 seconds
Trigger:
{Template_FreeBSD:net.tcp.service[smtp].count(120,0)}=2
Now, if the service is -legitamately- down, we keep getting repeat alerts, and recovery messages.
We can stop SMTP, and port 25 will simply say connection refused, Zabbix will correctly see the service as down.
Though, every 2 minutes or so, we get a new alert about it, and our escalation is set for 60 minutes.
In the event history, after 2 minutes, the triggers goes back to Normal, then 2 minutes later it goes to Problem again, causing a new alert.
But the service is not up, why does the alert go back to normal, and still flap?
Any insight would be helpful!
I have configured Zabbix to check on SMTP TCP service for instance, and we get service flap sometimes where SMTP simply doesn't respond sometimes (not sure why, network issue possibly, but very intermittent).
But a telnet to 25 two seconds later, and the service responds.
However, we changed to requiring two failed SMTP connection attepts over 120 seconds
Trigger:
{Template_FreeBSD:net.tcp.service[smtp].count(120,0)}=2
Now, if the service is -legitamately- down, we keep getting repeat alerts, and recovery messages.
We can stop SMTP, and port 25 will simply say connection refused, Zabbix will correctly see the service as down.
Though, every 2 minutes or so, we get a new alert about it, and our escalation is set for 60 minutes.
In the event history, after 2 minutes, the triggers goes back to Normal, then 2 minutes later it goes to Problem again, causing a new alert.
But the service is not up, why does the alert go back to normal, and still flap?
Any insight would be helpful!
Comment