Hello,
Recently i started experience issue with my Zabbix HA cluster 7.2 where it would just freeze and my two node cluster will seize to function completely for few minutes (sometimes manual restart required)
Looking at the logs it is something to do with the ha manager process when there is db housekeeping tasks. I upgraded to 7.4 with hopes that it will help but unfortunately it was the same situation.
My current setup is Zabbix 7.4 , two nodes in HA mode, each of them is also running Mariadb 10.5 + galera and third node as galera witness
Currently i disabled HA mode since i am having Zabbix downtime few times a day because of the ha manager service. I would like to either resolve the HA mode issue or i would need to look into using pacemakers/corosync instead of the builtin HA mode
--
Recently i started experience issue with my Zabbix HA cluster 7.2 where it would just freeze and my two node cluster will seize to function completely for few minutes (sometimes manual restart required)
Looking at the logs it is something to do with the ha manager process when there is db housekeeping tasks. I upgraded to 7.4 with hopes that it will help but unfortunately it was the same situation.
My current setup is Zabbix 7.4 , two nodes in HA mode, each of them is also running Mariadb 10.5 + galera and third node as galera witness
Currently i disabled HA mode since i am having Zabbix downtime few times a day because of the ha manager service. I would like to either resolve the HA mode issue or i would need to look into using pacemakers/corosync instead of the builtin HA mode
--
Code:
1227296:20250724:053525.954 starting HA manager 1227296:20250724:053525.954 HA manager started in standby mode 1214541:20250724:060031.956 "zbx-node1" node is working in "standby" mode 1227296:20250724:060937.644 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [commit;] 1227296:20250724:060937.644 slow query: 7.687342 sec, "commit;" 1227296:20250724:060937.644 ERROR: rollback without transaction. Please report it to Zabbix Team. 1227296:20250724:060937.644 === Backtrace: === 1227296:20250724:060937.645 10: /usr/sbin/zabbix_server: ha manager(zbx_backtrace+0x41) [0x56152ee4df11] 1227296:20250724:060937.645 9: /usr/sbin/zabbix_server: ha manager(zbx_dbconn_rollback+0x10b) [0x56152ee39acb] 1227296:20250724:060937.645 8: /usr/sbin/zabbix_server: ha manager(+0x2741c4) [0x56152ec771c4] 1227296:20250724:060937.645 7: /usr/sbin/zabbix_server: ha manager(ha_manager_thread+0x42a) [0x56152ec795ba] 1227296:20250724:060937.645 6: /usr/sbin/zabbix_server: ha manager(zbx_ha_start+0x6d) [0x56152ec7afbd] 1227296:20250724:060937.646 5: /usr/sbin/zabbix_server: ha manager(MAIN_ZABBIX_ENTRY+0x1089) [0x56152eadeed9] 1227296:20250724:060937.646 4: /usr/sbin/zabbix_server: ha manager(zbx_daemon_start+0x145) [0x56152ee52405] 1227296:20250724:060937.646 3: /usr/sbin/zabbix_server: ha manager(main+0x3f5) [0x56152ead2bf5] 1227296:20250724:060937.646 2: /lib64/libc.so.6(+0x295d0) [0x7f2cbee295d0] 1227296:20250724:060937.646 1: /lib64/libc.so.6(__libc_start_main+0x80) [0x7f2cbee29680] 1227296:20250724:060937.646 0: /usr/sbin/zabbix_server: ha manager(_start+0x25) [0x56152ead9f95] zabbix_server: ha manager: dbconn.c:1039: dbconn_rollback: Assertion `0' failed. 1214541:20250724:060938.697 failed to wait on child processes: [10] No child processes 1214541:20250724:060949.007 cannot pause HA manager: Cannot connect to service "haservice": [111] Connection refused. 1214541:20250724:060949.007 Zabbix Server stopped. Zabbix 7.4.0 (revision 372a4e93c48). 1240082:20250724:060959.210 Starting Zabbix Server. Zabbix 7.4.0 (revision 372a4e93c48). 1240082:20250724:060959.210 ****** Enabled features ****** 1240082:20250724:060959.210 SNMP monitoring: YES 1240082:20250724:060959.210 IPMI monitoring: YES 1240082:20250724:060959.210 Web monitoring: YES 1240082:20250724:060959.210 VMware monitoring: YES 1240082:20250724:060959.210 SMTP authentication: YES 1240082:20250724:060959.210 ODBC: YES 1240082:20250724:060959.210 SSH support: YES 1240082:20250724:060959.210 IPv6 support: YES 1240082:20250724:060959.210 TLS support: YES 1240082:20250724:060959.210 ****************************** 1240082:20250724:060959.210 using configuration file: /etc/zabbix/zabbix_server.conf 1240082:20250724:060959.215 current database version (mandatory/optional): 07040000/07040000 1240082:20250724:060959.215 required mandatory version: 07040000 1240082:20250724:060959.216 database could be upgraded to use primary keys in history tables 1240083:20250724:060959.217 starting HA manager 1240082:20250724:061009.227 "zbx-node1" node started in "standby" mode 1240082:20250724:061009.227 cannot write to IPC socket: Broken pipe 1240082:20250724:061009.227 cannot write to IPC socket: Broken pipe 1240082:20250724:061009.227 cannot write to IPC socket: Broken pipe 1240082:20250724:061009.227 cannot write to IPC socket: Broken pipe 1240083:20250724:061023.705 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [commit;] 1240083:20250724:061023.705 slow query: 24.486364 sec, "commit;" 1240083:20250724:061023.705 ERROR: rollback without transaction. Please report it to Zabbix Team. 1240083:20250724:061023.705 === Backtrace: === 1240083:20250724:061023.706 11: /usr/sbin/zabbix_server: ha manager(zbx_backtrace+0x41) [0x562460077f11] 1240083:20250724:061023.706 10: /usr/sbin/zabbix_server: ha manager(zbx_dbconn_rollback+0x10b) [0x562460063acb] 1240083:20250724:061023.706 9: /usr/sbin/zabbix_server: ha manager(+0x2741c4) [0x56245fea11c4] 1240083:20250724:061023.706 8: /usr/sbin/zabbix_server: ha manager(+0x2752ad) [0x56245fea22ad] 1240083:20250724:061023.706 7: /usr/sbin/zabbix_server: ha manager(ha_manager_thread+0x189e) [0x56245fea4a2e] 1240083:20250724:061023.706 6: /usr/sbin/zabbix_server: ha manager(zbx_ha_start+0x6d) [0x56245fea4fbd] 1240083:20250724:061023.707 5: /usr/sbin/zabbix_server: ha manager(MAIN_ZABBIX_ENTRY+0x9e1) [0x56245fd08831] 1240083:20250724:061023.707 4: /usr/sbin/zabbix_server: ha manager(zbx_daemon_start+0x145) [0x56246007c405] 1240083:20250724:061023.707 3: /usr/sbin/zabbix_server: ha manager(main+0x3f5) [0x56245fcfcbf5] 1240083:20250724:061023.707 2: /lib64/libc.so.6(+0x295d0) [0x7f2abac295d0] 1240083:20250724:061023.707 1: /lib64/libc.so.6(__libc_start_main+0x80) [0x7f2abac29680] 1240083:20250724:061023.707 0: /usr/sbin/zabbix_server: ha manager(_start+0x25) [0x56245fd03f95] zabbix_server: ha manager: dbconn.c:1039: dbconn_rollback: Assertion `0' failed. 1240082:20250724:061024.761 failed to wait on child processes: [10] No child processes 1240082:20250724:061035.076 cannot pause HA manager: Cannot connect to service "haservice": [111] Connection refused. 1240082:20250724:061035.076 Zabbix Server stopped. Zabbix 7.4.0 (revision 372a4e93c48). 1240258:20250724:061045.210 Starting Zabbix Server. Zabbix 7.4.0 (revision 372a4e93c48). 1240258:20250724:061045.210 ****** Enabled features ****** 1240258:20250724:061045.210 SNMP monitoring: YES 1240258:20250724:061045.211 IPMI monitoring: YES 1240258:20250724:061045.211 Web monitoring: YES 1240258:20250724:061045.211 VMware monitoring: YES 1240258:20250724:061045.211 SMTP authentication: YES 1240258:20250724:061045.211 ODBC: YES 1240258:20250724:061045.211 SSH support: YES 1240258:20250724:061045.211 IPv6 support: YES 1240258:20250724:061045.211 TLS support: YES 1240258:20250724:061045.211 ****************************** --
Code:
[mysqld] sql_mode = STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION # Data and logs datadir = /var/lib/mysql tmpdir = /tmp log_error = /var/log/mysql/mariadb.log slow_query_log = 1 slow_query_log_file = /var/log/mysql/mariadb-slow.log long_query_time = 1 # InnoDB tuning innodb_buffer_pool_size = 10G # ~60–65% of RAM innodb_buffer_pool_instances = 4 innodb_log_file_size = 512M # Enough for high insert rate innodb_log_buffer_size = 512M innodb_flush_log_at_trx_commit = 2 # Better performance (risk of minimal data loss in crash) innodb_flush_method = O_DIRECT innodb_file_per_table = 1 innodb_lock_wait_timeout = 30 innodb_deadlock_detect = ON innodb_print_all_deadlocks = 1 # Thread and connection management max_connections = 300 thread_cache_size = 64 table_open_cache = 2048 table_definition_cache = 1024 open_files_limit = 65535 # Query cache (off for performance) query_cache_type = 0 query_cache_size = 0 # Temp tables tmp_table_size = 128M max_heap_table_size = 128M # Sort and join buffers sort_buffer_size = 2M join_buffer_size = 2M # MyISAM (used rarely) key_buffer_size = 16M # Performance schema performance_schema = ON