Hello,
need an advice, can't figure out what happened to my zabbix server.
HW specs (all is installed on one server):
Zabbix v2.2.1, NVPS ~70 (but I have a lot of trapper checks)
Some day zabbix caches were filled up (history, text write), so I raised cache sizes, and everything worked ok about 1 week. Last Friday it happened again.
Current settings for cache sizes:
I tried to tune mysql, but it doesn't help, current setting for mysql:
What I'm seeing now, is that cache % free for history write and text write is almost 0%, history syncer busy is almost 100%
If I restart zabbix server, all is ok for 10-15 minutes, and then I get the same situation. I guess MySQL can't save all the data from zabbix quickly enough, but I can't undestand why that happened, i.e. we didn't add 1000k hosts last Friday nor new templates/checks to zabbix.
I've implemented today partitioning for history and trends tables, but that doesn't helped. I'm not a DBA and kinda stuck right know, I just doesn't know, what may be wrong here. I've disabled a lot of my trapper checks in zabbix, removed custom templates from all hosts. Still nothing...
there is such scare process in top like:
that i very slow, as I understand.
LA is around 1.5, no active swap usage.
I'm also see a lot of failed queries in zabbix server log:
Any hints/advices how to troubleshoot mysql performance? Thanks.
some graphs:
cache usage:

internal processes:

cpu util:
need an advice, can't figure out what happened to my zabbix server.
HW specs (all is installed on one server):
Code:
1 CPU Xeon E5504 2Ghz, 4 core 32 GB RAM RAID 10 4xHDD, 130Gb 10k RPM SAS disks
Some day zabbix caches were filled up (history, text write), so I raised cache sizes, and everything worked ok about 1 week. Last Friday it happened again.
Current settings for cache sizes:
Code:
CacheSize=32M HistoryCacheSize=512M TrendCacheSize=128M HistoryTextCacheSize=128M ValueCacheSize=128M
Code:
[mysqld] interactive_timeout = 86400 wait_timeout = 86400 max_allowed_packet = 32M query_cache_type = 0 query_cache_size = 0 query_cache_limit = 4M thread_cache_size = 96 innodb_flush_log_at_trx_commit = 2 join_buffer_size = 256k read_buffer_size = 256k max_connections = 1024 innodb_thread_concurrency = 0 read_rnd_buffer_size = 512k sort_buffer_size = 512K innodb_lock_wait_timeout = 250 thread_concurrency = 4 performance_schema = on datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock user = mysql symbolic-links = 0 table_open_cache = 2048 myisam_sort_buffer_size = 16M innodb_buffer_pool_size = 26G innodb_flush_method = O_DIRECT innodb_additional_mem_pool_size = 16M innodb_data_file_path=ibdata1:10M:autoextend innodb_log_file_size = 512M innodb_log_buffer_size = 8M innodb_fast_shutdown=0 innodb_file_per_table skip-networking tmpdir=/temp innodb_file_format=Barracuda
If I restart zabbix server, all is ok for 10-15 minutes, and then I get the same situation. I guess MySQL can't save all the data from zabbix quickly enough, but I can't undestand why that happened, i.e. we didn't add 1000k hosts last Friday nor new templates/checks to zabbix.
I've implemented today partitioning for history and trends tables, but that doesn't helped. I'm not a DBA and kinda stuck right know, I just doesn't know, what may be wrong here. I've disabled a lot of my trapper checks in zabbix, removed custom templates from all hosts. Still nothing...
there is such scare process in top like:
Code:
history syncer #6 [synced 289 items in 36.652641 sec]
LA is around 1.5, no active swap usage.
I'm also see a lot of failed queries in zabbix server log:
Code:
4364:20140113:115159.992 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update autoreg_host set listen_ip='192.168.243.154',listen_dns='host.domain.local',listen_port=10050,host_metadata='' where autoreg_hostid=6]
some graphs:
cache usage:

internal processes:

cpu util:
), now LA is ~0.1 and looks like everything is ok. Hope it is stable now and will not fail again.
Comment