Ad Widget

Collapse

Periodic spikes in history syncer causing db deadlocks

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sb195c@att.com
    Member
    • Oct 2017
    • 41

    #1

    Periodic spikes in history syncer causing db deadlocks

    I'm noticing an issue I'm having where I periodically get spikes in my history syncer processes, that are causing slow queries/inserts/etc and false alerts due to a delay in data collection in the db, that even zabbix server itself is reporting not reachable. What's also strange is this seems to be occurring at the same time not only on the zabbix server, but on on all my proxies. All hosts being monitored are assigned to a proxy. It seems to occur every hour, and causes the most issue at night. Could this be an item or discovery rule causing this? I'm at a loss. All of these systems are on fast hardware with plenty of ram/cpu and showing no OS level related issues. DB has 12 CPU and 64 GB or RAM, and fast storage array. Tests on network interfaces and DB iostat show no issues. My last occurrence yesterday caused such an issue that the system was up to 12 hours behind in writing history to the db, resulting in me having to hard kill the zabbix process (losing all the data in the cache). The only way I could get it to recover was disable each proxy and bring them up one at a time until each queue was empty. System is running fine now but I did have another spike over night and period of delayed queries/deadlocks in the zabbix log.
    Also strange, one of these proxies doesn't even have any hosts assigned to it, it's a standby.


    Any ideas what to look for?
    Number of hosts (enabled/disabled/templates) 913 822 / 2 / 89
    Number of items (enabled/disabled/not supported) 355434 355036 / 196 / 202
    Number of triggers (enabled/disabled [problem/ok]) 159955 159772 / 183 [158 / 159614]
    Number of users (online) 11 2
    Required server performance, new values per second 1915.64

    I have those hosts all split across 4 proxies

    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=0
    PidFile=/var/run/zabbix/zabbix_server.pid
    SocketDir=/var/run/zabbix
    StartPollers = 10
    StartPollersUnreachable = 10
    StartPingers = 10
    StartDiscoverers = 10
    SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
    CacheSize = 2G
    HistoryCacheSize = 256M
    HistoryIndexCacheSize = 2G
    TrendCacheSize = 1G
    ValueCacheSize = 2G
    Timeout=4
    AlertScriptsPath=/usr/lib/zabbix/alertscripts
    ExternalScripts=/usr/lib/zabbix/externalscripts
    FpingLocation=/usr/bin/fping
    Fping6Location=/usr/bin/fping6
    LogSlowQueries=10000
    StartProxyPollers=20
    ProxyConfigFrequency = 90
    ProxyDataFrequency = 1
    StartDBSyncers = 4
    Attached Files
  • sb195c@att.com
    Member
    • Oct 2017
    • 41

    #2
    Seems like the housekeeper is causing it to go haywire. Any thoughts on why it's blowing up the history syncer and history cache?
    Attached Files

    Comment

    • sb195c@att.com
      Member
      • Oct 2017
      • 41

      #3
      I was able to find the boogeyman causing this.



      Unbeknownst to me, I had an events table that had upwards of 12.5 million rows of polluted trigger data in the events table, along with 6 million rows of the same in problems table. After a 35 minute long delete statement in MySQL to get rid of it all, and fixing the offending trigger the Housekeeper is no longer crushing my system.

      It would be a good thing for people downloading these stock templates for you to remove or fix this trigger in the Template Module Interfaces SNMPv2 template please. This caused me loss of data and quite a few hours of troubleshooting to find the culprit.
      Last edited by [email protected]; 03-04-2018, 15:36.

      Comment

      Working...