Ad Widget

Collapse

Unstable Zabbix 2.4.6 CentOS 7, large DB

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jamesNJ
    Senior Member
    • Jun 2015
    • 103

    #1

    Unstable Zabbix 2.4.6 CentOS 7, large DB

    Hello all, I am hoping someone can give me some advice/links that might help.

    Our Zabbix seems to have been fine for long time, but recently (last week or so) is becoming increasingly temperamental. We seem to be getting periodic warnings from zabbix like this:
    Zabbix housekeeper processes more than 75% busy
    Zabbix history syncer processes more than 75% busy

    Which are then followed by a slew of mostly false alerts (like can't connect to host, etc.). Server is fairly large, 8 core 2.5ghz, 32g ram, and >20tb disk (mostly unused at this time). Using mariadb (mysql) packaged with centos7.

    First hints of database issue come from zabbix server log. I'm seeing periodic messages relating to insert timeouts on the history and history_uint tables. Other timeouts exist but majority are those 2 tables. Also, casual use of mysql "show process list" I can usually catch inserts to history and history_uint taking >60sec to complete. I suspect that the insert problems start when the zabbix periodic maintenance tries to delete records from the history tables.

    A review of table sizes I see the following which look interesting (simple select count(*) on table):
    events 427665
    trends 5323059
    trends_uint 8991908
    history 14263420
    history_str 9033007
    history_uint 18659740

    I'm pretty sure I don't need that much data. I only have about 150 hosts total and I'm not doing anything complex or exotic. Mainly monitoring linux, windows hosts, less than about 6 web scenarios, maybe about 70 or so SNMP mostly UPS units.

    My first thought is to update the "items" table and impose tighter limits on 'history' and 'trends' (I'm mostly at 7/365 but have a bunch of 60 & 90 day history that could be trimmed). I also have the typically huge mysql "ibdata1" data file problem and plan to split innodb to separate files with dump/restore, etc.

    I found this article: http://machinenoise.org/2014/cleanin...-database.html
    which I think will help me clean up history_uint a little bit.

    Is it safe to trim the other large tables (history_str, history, trends_uint, trends) by the 'clock' field similar to the history_uint link above?

    Can someone give me any other pointers or advice on how to clean up and correct this ill-performing server? I think I would prefer to keep the housekeeper running and not split my DB history tables by time, but if that is the only real solution I may have to consider it.

    Thanks!
  • axlrod
    Junior Member
    • Dec 2015
    • 7

    #2
    Seems like you are running your database on same host as zabbix ? This is considered bad practice, could be one of the issues.

    Comment

    • LenR
      Senior Member
      • Sep 2009
      • 1005

      #3
      When we had problems like this, it was due to I/O performance of the db disks.

      Suggestions:

      Tune db buffers to trade ram use for disk read
      Analyze DB disk, can it support good write rate? ISCSI was our problem
      Tune Zabbix cache
      Use the partitioning technique to disable housekeeping on history and trends.

      I don't see Zabbix "SERVER" on the DB engine as a problem, but do heavy data collection on proxies and have a separate web frontend. We are monitoring about 8000 servers and switches with the DB on the server at about 5000 NVPS. VMware hosteed on FC disk.

      Comment

      • jamesNJ
        Senior Member
        • Jun 2015
        • 103

        #4
        Thanks for your help and consideration.

        I ended up doing a number of things including the following:
        - bulk adjust all items to smaller time frames and fix any obvious mistakes
        - trim the history tables for records not needed
        - set 1 file per table, dump, drop databases, clean up huge innodb files and restore
        - increase zabbix cache settings.

        I still think that I need to address storage performance in this case. I thought it was unusual that the zabbix database dump took about 10min to dump the equivalent of 2g of data and the subsequent restore took longer than 1hr.

        However I think breaking up the huge innodb file to individual files per table was what was needed. With the adjustments listed above, my zabbix feels faster than the first day I deployed it. Even with house keeping fully enable, I have yet to see that process stalled in the mysql process list.

        Comment

        Working...