Ad Widget

Collapse

alerter process - very high cpu utilization

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • stevem
    Junior Member
    • Mar 2012
    • 4

    #1

    alerter process - very high cpu utilization

    We are monitoring over 500 hosts spread across several sites geographically (using zabbix proxies at each site).

    The alerter process for Zabbix will often spike to 100% busy for a minute or two, but calm back down. It has progressively gotten worse, to the point where the process was busy (100%) for a couple days at a time, causing Zabbix to fail to connect to monitored hosts and throwing many false alarms. Postgresql also has a very hard time when this happens.

    I have observed the logs and don't see any correlation with the spikes. When I attach an strace to the pid, I see a lot of:

    Code:
    select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
    read(4, 0x289a9e3, 5)       = -1 EAGAIN (Resource temporarily unavailable)
    Is there a way to start multiple Alerter processes to share the load? What do we need in order to prevent this from happening?
  • stevem
    Junior Member
    • Mar 2012
    • 4

    #2
    The problem lied in that Zabbix was opening/closing tickets in our Trouble Ticket system. Since there is only one alerter process, the process had to wait for the ticketing system to finish before it was released and allowed to handle the next record/request.

    The key to solving this issue was to use an outside queuing mechanism for the alerter. In other words, the action (script) that Zabbix is now contacting saves the information to a queue which another script picks up at a later time and passes onto the ticketing system. This allowed the alerter process to return much quicker and help alleviate the 100% busy problem.

    Comment

    Working...