Hi Zabbix forums:
First of all, thank you in advance for all the help you might provide. I've been using Zabbix for years now and never have to post here anything. That speaks volumes about the quality of the docummentation. So here's my problem.
Key info: Zabbix Server and agents 5.0.x
We have a fairly big log file (about 2G/d) that we want to monitor for some specific error. In order to do that, heres our Item config:

We are specifically filtering for the lines that match our pattern / error we want to monitor . We do that in order to only store those lines (thus reducing history_log tables, we are a bit size constrained). And that works fine, we receive those lines:

We want to raise the alarm whenver we have more than n hits on a t time period. So, for this example, if we get more than 5 errors in a 5 minute interval, we want to alert. So here is our trigger config:

And here is our problem : either count is not working properly or we are doing something wrong (more than likely that last one) but when it detects the five erros the trigger goes to problem, but it never goes back. It never disables. No matter how much time I wait, it never disables. Even though nothing is written on that file. Even if I write correct / non matching lines on the file. I've read and re read the log item docummentation, the count function docummentation and can't find the error.
What am I doing wrong?
Thanks in advance,
Iván Lago.
First of all, thank you in advance for all the help you might provide. I've been using Zabbix for years now and never have to post here anything. That speaks volumes about the quality of the docummentation. So here's my problem.
Key info: Zabbix Server and agents 5.0.x
We have a fairly big log file (about 2G/d) that we want to monitor for some specific error. In order to do that, heres our Item config:
We are specifically filtering for the lines that match our pattern / error we want to monitor . We do that in order to only store those lines (thus reducing history_log tables, we are a bit size constrained). And that works fine, we receive those lines:
We want to raise the alarm whenver we have more than n hits on a t time period. So, for this example, if we get more than 5 errors in a 5 minute interval, we want to alert. So here is our trigger config:
And here is our problem : either count is not working properly or we are doing something wrong (more than likely that last one) but when it detects the five erros the trigger goes to problem, but it never goes back. It never disables. No matter how much time I wait, it never disables. Even though nothing is written on that file. Even if I write correct / non matching lines on the file. I've read and re read the log item docummentation, the count function docummentation and can't find the error.
What am I doing wrong?
Thanks in advance,
Iván Lago.
Comment