Ad Widget

Collapse

Alerting issue during mass outages.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mohammedrazal
    Junior Member
    • Nov 2015
    • 4

    #1

    Alerting issue during mass outages.

    Dear Team,

    About our environment:
    -- > In our Zabbix we have around 600 hosts and ~600 values processed/ s, everything properly works until we have some mass outage in the landscape.
    --> We use customized alerting script for notification.
    --> We use Zabbix proxy based monitoring,

    when the zabbix agents are down for multiple system (because of network issues etc) , could see that our alerting scripts running for already recovered issues, though the alert and escalation table is having no values and
    sometimes genuine alerts gets stuck in the alerts tables.
    Usually zabbix server restart fix this issue, so my feeling is that there is some cache which Zabbix refers to when alerting and same gets cleaned up when I restart.

    At the same time, we checked internal processes, the zabbix alerter process and housekeeping process are 100 % busy most of the time whien this issue happens, I am working on the table partitioning
    with which we could reduce some system load on housekeeping process, but is the alerter/ housekeeping process responsible for this sort of behavior.

    Any help would be really appreciated.

    Thanks,
    Razal
Working...