Dear Team,
About our environment:
-- > In our Zabbix we have around 600 hosts and ~600 values processed/ s, everything properly works until we have some mass outage in the landscape.
--> We use customized alerting script for notification.
--> We use Zabbix proxy based monitoring,
when the zabbix agents are down for multiple system (because of network issues etc) , could see that our alerting scripts running for already recovered issues, though the alert and escalation table is having no values and
sometimes genuine alerts gets stuck in the alerts tables.
Usually zabbix server restart fix this issue, so my feeling is that there is some cache which Zabbix refers to when alerting and same gets cleaned up when I restart.
At the same time, we checked internal processes, the zabbix alerter process and housekeeping process are 100 % busy most of the time whien this issue happens, I am working on the table partitioning
with which we could reduce some system load on housekeeping process, but is the alerter/ housekeeping process responsible for this sort of behavior.
Any help would be really appreciated.
Thanks,
Razal
About our environment:
-- > In our Zabbix we have around 600 hosts and ~600 values processed/ s, everything properly works until we have some mass outage in the landscape.
--> We use customized alerting script for notification.
--> We use Zabbix proxy based monitoring,
when the zabbix agents are down for multiple system (because of network issues etc) , could see that our alerting scripts running for already recovered issues, though the alert and escalation table is having no values and
sometimes genuine alerts gets stuck in the alerts tables.
Usually zabbix server restart fix this issue, so my feeling is that there is some cache which Zabbix refers to when alerting and same gets cleaned up when I restart.
At the same time, we checked internal processes, the zabbix alerter process and housekeeping process are 100 % busy most of the time whien this issue happens, I am working on the table partitioning
with which we could reduce some system load on housekeeping process, but is the alerter/ housekeeping process responsible for this sort of behavior.
Any help would be really appreciated.
Thanks,
Razal