Hello all,
I would like to get your help and opinions on the following problem:
Our Zabbix server suddenly started critically failing, and since then, it has become a regular occurrence. Initially, the GUI wouldn't load and was stuck on "Loading." After a while, the browser displayed the message:
Database error: No such file or directory.
Another time, the error message was:
Database error: SQL statement execution has failed
"UPDATE sessions SET lastaccess='....' WHERE sessionid='.....'"
From the server logs, we see errors like these:
We tried increasing the parameters and improving the performance of our MariaDB instance (see updated config below).
However, we cannot find much useful information in the logs. One common point is that before the service starts failing, we see a lot of slow query logs related to the history_uint table, such as:
But we generally can't pinpoint the root cause.
There is plenty of free disk space, RAM, and CPU available. The issue seems database-related, but it originates from Zabbix. When it gets stuck, only kill -9 works (and not always).
When the Zabbix process is stuck, we can still access the database, but it does not react to any Zabbix-related queries (e.g., SELECT statements). However, it does respond to internal database commands like SHOW PROCESSLIST;.
Environment:
OS: Debian Linux 12.9
Zabbix Server Version: 7.0.9
MariaDB Version: 10.11.6
Hosts: 164
Items: 29,397
Triggers: 13,524
New Values per Second: 444
MariaDB Configuration:
Zabbix Server Configuration:
Does anyone have any insights or ideas on what might be causing these failures?
Any advice would be greatly appreciated!
I would like to get your help and opinions on the following problem:
Our Zabbix server suddenly started critically failing, and since then, it has become a regular occurrence. Initially, the GUI wouldn't load and was stuck on "Loading." After a while, the browser displayed the message:
Database error: No such file or directory.
Another time, the error message was:
Database error: SQL statement execution has failed
"UPDATE sessions SET lastaccess='....' WHERE sessionid='.....'"
From the server logs, we see errors like these:
Code:
4138:20250116:144301.926 slow query: 609.183770 sec, "update ha_node set lastaccess=unix_timestamp() where ha_nodeid='......'" 4138:20250116:144301.926 [Z3005] query failed: [2006] Server has gone away [rollback;] 4212:20250116:144301.927 [Z3001] connection to database 'zabbix' failed: [2013] Lost connection to server at 'handshake: reading initial communication packet', system error: 104 4212:20250116:144301.927 database is down: reconnecting in 10 seconds 4151:20250116:144301.927 slow query: 1010.602710 sec, "delete from history where itemid=116116 and clock<1736399235"
However, we cannot find much useful information in the logs. One common point is that before the service starts failing, we see a lot of slow query logs related to the history_uint table, such as:
Code:
657173:20250128:182651.191 slow query: 10905.365435 sec, "insert into history_uint (itemid,clock,ns,value) values (42245,1738074305,815327355,8323534848),(43805,173 8074305,8163... ... ...
There is plenty of free disk space, RAM, and CPU available. The issue seems database-related, but it originates from Zabbix. When it gets stuck, only kill -9 works (and not always).
When the Zabbix process is stuck, we can still access the database, but it does not react to any Zabbix-related queries (e.g., SELECT statements). However, it does respond to internal database commands like SHOW PROCESSLIST;.
Environment:
OS: Debian Linux 12.9
Zabbix Server Version: 7.0.9
MariaDB Version: 10.11.6
Hosts: 164
Items: 29,397
Triggers: 13,524
New Values per Second: 444
MariaDB Configuration:
Code:
[mysqld] pid-file = /run/mysqld/mysqld.pid basedir = /usr bind-address = 127.0.0.1 key_buffer_size = 10M max_allowed_packet = 256M myisam_recover_options = BACKUP max_connections = 600 query_cache_limit = 1M query_cache_size = 256M tmp_table_size = 256M max_heap_table_size = 256M performance_schema = OFF log_error = /var/log/mysql/error.log log_slow_query_file = /var/log/mysql/mariadb-slow.log log_slow_query_time = 10 character-set-server = utf8mb4 collation-server = utf8mb4_general_ci innodb_buffer_pool_size = 9G innodb_read_io_threads = 8 innodb_write_io_threads = 8 innodb_io_capacity = 2000 innodb_file_per_table = ON innodb_default_row_format = DYNAMIC innodb_lock_wait_timeout = 100 innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 0
Code:
LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=0 DebugLevel=3 PidFile=/run/zabbix/zabbix_server.pid SocketDir=/run/zabbix DBHost=localhost DBName=xxxxxx DBUser=xxxxxx DBPassword=xxxxxxxx DBSocket=/run/mysqld/mysqld.sock SNMPTrapperFile=/var/log/snmptrap/snmptrap.log HousekeepingFrequency=1 MaxHousekeeperDelete=50000 CacheSize=512M CacheUpdateFrequency=10 HistoryCacheSize=1024M HistoryIndexCacheSize=512M TrendCacheSize=64M ValueCacheSize=256M Timeout=4 FpingLocation=/usr/bin/fping Fping6Location=/usr/bin/fping6 LogSlowQueries=3000 StartLLDProcessors=1 StatsAllowedIP=127.0.0.1 EnableGlobalScripts=0
Any advice would be greatly appreciated!
Comment