Ad Widget

Collapse

Zabbix stops sending actions during big outages when there are lots of triggers

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Epicurean
    Junior Member
    • Mar 2013
    • 3

    #1

    Zabbix stops sending actions during big outages when there are lots of triggers

    Greetings fellow Zabbix admins,

    We're having a problem with our Zabbix installation that only occurs during large network outages when lots of hosts have triggers in PROBLEM state. When lots of triggers start firing (let's say during an entire datacenter outage), actions eventually stop happening and emails stop coming in. The dashboard does get updated and the server still processes new values, but no emails are sent, and the "Actions" column stays blank, as though it didn't even try to process any actions.

    I've found a workaround which has worked every time: stop Zabbix, go into MySQL and "TRUNCATE TABLE actions". Then, when all of our 2 actions have been erased with the truncate table, manually go into the GUI and manually recreate the actions. Then start zabbix. Emails start coming in again.

    But it doesn't make sense, and the only thing that changes is that when the actions are recreated, the action ID number increments so a new value for action ID is used. It doesn't seem to be related to server load; load averages are normal while this is occurring, and MySQL looks okay.

    Anyone else seeing the same problem, or have any suggestions?
    Thanks!
  • LenR
    Senior Member
    • Sep 2009
    • 1005

    #2
    Look at the zabbix server template items alerter and escalator busy during these events. I'd guess your queue is very high also. Not sure what to do, but dependencies might help, no use alerting on any systems in site b if the link to site b is down, that type of thing. Alert on the critical paths only.

    Comment

    • Epicurean
      Junior Member
      • Mar 2013
      • 3

      #3
      Thanks, that's a great suggestion and sounds like it will work. I have already set up some dependencies to cut down on the number of alerts. You're right, we don't need to get alerts for eg. HTTP being down when ping is down, and I think that will help immensely.

      We are also trying some of the suggestions in the stickied post in this forum regarding database optimization, just to see if it will help, but we don't really see too many performance problems that I would think would normally be linked to DB slowness. But the size alone is alarming; our DB has already grown to 84 gigs (!) with only 500 hosts monitored. We're moving the DB to a separate host, with SSD drives, and will be partitioning the database to keep the most active tables on different volumes.

      Comment

      • Epicurean
        Junior Member
        • Mar 2013
        • 3

        #4
        We haven't seen any problems since we set up trigger dependencies. We also moved some non-zabbix-related services off the host to reduce the load.

        Looks like it was related to DB performance issues, in the end. Although what MySQL queries specifically were locking up, if any, I couldn't figure out.

        I'd set up an internal Zabbix test which showed that the escalator was going to 100% busy and getting stuck there permanently. Now that doesn't seem to happen anymore (doesn't even go above 10%).

        Comment

        Working...