Hello all, I am hoping someone can give me some advice/links that might help.
Our Zabbix seems to have been fine for long time, but recently (last week or so) is becoming increasingly temperamental. We seem to be getting periodic warnings from zabbix like this:
Zabbix housekeeper processes more than 75% busy
Zabbix history syncer processes more than 75% busy
Which are then followed by a slew of mostly false alerts (like can't connect to host, etc.). Server is fairly large, 8 core 2.5ghz, 32g ram, and >20tb disk (mostly unused at this time). Using mariadb (mysql) packaged with centos7.
First hints of database issue come from zabbix server log. I'm seeing periodic messages relating to insert timeouts on the history and history_uint tables. Other timeouts exist but majority are those 2 tables. Also, casual use of mysql "show process list" I can usually catch inserts to history and history_uint taking >60sec to complete. I suspect that the insert problems start when the zabbix periodic maintenance tries to delete records from the history tables.
A review of table sizes I see the following which look interesting (simple select count(*) on table):
events 427665
trends 5323059
trends_uint 8991908
history 14263420
history_str 9033007
history_uint 18659740
I'm pretty sure I don't need that much data. I only have about 150 hosts total and I'm not doing anything complex or exotic. Mainly monitoring linux, windows hosts, less than about 6 web scenarios, maybe about 70 or so SNMP mostly UPS units.
My first thought is to update the "items" table and impose tighter limits on 'history' and 'trends' (I'm mostly at 7/365 but have a bunch of 60 & 90 day history that could be trimmed). I also have the typically huge mysql "ibdata1" data file problem and plan to split innodb to separate files with dump/restore, etc.
I found this article: http://machinenoise.org/2014/cleanin...-database.html
which I think will help me clean up history_uint a little bit.
Is it safe to trim the other large tables (history_str, history, trends_uint, trends) by the 'clock' field similar to the history_uint link above?
Can someone give me any other pointers or advice on how to clean up and correct this ill-performing server? I think I would prefer to keep the housekeeper running and not split my DB history tables by time, but if that is the only real solution I may have to consider it.
Thanks!
Our Zabbix seems to have been fine for long time, but recently (last week or so) is becoming increasingly temperamental. We seem to be getting periodic warnings from zabbix like this:
Zabbix housekeeper processes more than 75% busy
Zabbix history syncer processes more than 75% busy
Which are then followed by a slew of mostly false alerts (like can't connect to host, etc.). Server is fairly large, 8 core 2.5ghz, 32g ram, and >20tb disk (mostly unused at this time). Using mariadb (mysql) packaged with centos7.
First hints of database issue come from zabbix server log. I'm seeing periodic messages relating to insert timeouts on the history and history_uint tables. Other timeouts exist but majority are those 2 tables. Also, casual use of mysql "show process list" I can usually catch inserts to history and history_uint taking >60sec to complete. I suspect that the insert problems start when the zabbix periodic maintenance tries to delete records from the history tables.
A review of table sizes I see the following which look interesting (simple select count(*) on table):
events 427665
trends 5323059
trends_uint 8991908
history 14263420
history_str 9033007
history_uint 18659740
I'm pretty sure I don't need that much data. I only have about 150 hosts total and I'm not doing anything complex or exotic. Mainly monitoring linux, windows hosts, less than about 6 web scenarios, maybe about 70 or so SNMP mostly UPS units.
My first thought is to update the "items" table and impose tighter limits on 'history' and 'trends' (I'm mostly at 7/365 but have a bunch of 60 & 90 day history that could be trimmed). I also have the typically huge mysql "ibdata1" data file problem and plan to split innodb to separate files with dump/restore, etc.
I found this article: http://machinenoise.org/2014/cleanin...-database.html
which I think will help me clean up history_uint a little bit.
Is it safe to trim the other large tables (history_str, history, trends_uint, trends) by the 'clock' field similar to the history_uint link above?
Can someone give me any other pointers or advice on how to clean up and correct this ill-performing server? I think I would prefer to keep the housekeeper running and not split my DB history tables by time, but if that is the only real solution I may have to consider it.
Thanks!
Comment