Ad Widget

Collapse

zabbix HA manager not responding in standby mode

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pepek
    Junior Member
    • Nov 2022
    • 9

    #1

    zabbix HA manager not responding in standby mode

    hello all,

    i have one issue with my zabbix HA cluster, i have 3 node cluster running zabbix 6.0.16 version.
    my architecture looks like this : Zabbix server on 3 nodes on top of that nginx running loadbalancer for VIP address, and i'm using percona DB with galera cluster across all nodes with proxysql.
    all 3 nodes have same settings but only one node having this issue :

    one or twice per day i get error on node 01 :
    2995288:20230606:051406.862 HA manager is not responding in standby mode, restarting it.
    3394211:20230606:051406.868 starting HA manager

    it's restarting only HA manager inside zabbix-server proces not entire zabbix- server proces.
    i can't see anyting else in zabbix logs.

    anyone have some experience with similar problem ??
  • Sarahbridges
    Junior Member
    • Jun 2023
    • 1

    #2
    Originally posted by pepek
    hello all,

    i have one issue with my zabbix HA cluster, i have 3 node cluster running zabbix 6.0.16 version.
    my architecture looks like this : Zabbix server on 3 nodes on top of that nginx running loadbalancer for VIP address, and i'm using percona DB with galera cluster across all nodes with proxysql.
    all 3 nodes have same settings but only one node having this issue :

    one or twice per day i get error on node 01 :
    2995288:20230606:051406.862 HA manager is not responding in standby mode, restarting it.
    3394211:20230606:051406.868 starting HA manager

    it's restarting only HA manager inside zabbix-server proces not entire zabbix- server proces.
    i can't see anyting else in zabbix logs.

    anyone have some experience with similar problem ??
    Hello,

    Specifically, on one of the nodes (node 01), the HA manager is not responding in standby mode, leading to its restart.

    Here are a few suggestions to troubleshoot and resolve the problem:

    Verify network connectivity: Ensure that there are no network connectivity issues between the nodes. Check for any network interruptions or packet loss that could be causing the HA manager to become unresponsive.

    Check resource utilization: Monitor the resource utilization on the problematic node, especially CPU and memory usage. Make sure that the node has enough resources available to handle the load imposed by the Zabbix server and HA manager processes.

    Examine system logs: Look into system logs (e.g., syslog or messages) on the affected node for any relevant error messages or warnings that could indicate the cause of the issue. Check for any kernel-level issues or conflicts.

    Review Zabbix configuration: Double-check the configuration files of the Zabbix server and HA manager to ensure consistency across the nodes. Pay close attention to any differences in configuration or settings on the problematic node compared to the others. PrepaidGiftBalance

    Verify database synchronization: Since you're using Percona DB with Galera Cluster, ensure that database synchronization is functioning correctly across all nodes. Check the Galera Cluster status and investigate any replication or consistency issues that could be impacting the HA manager's operation.

    Update Zabbix version: Consider upgrading your Zabbix installation to the latest stable version (if available). Newer versions often include bug fixes and improvements that could address the issue you're encountering.​

    I hope this helps you.





    Comment

    • pepek
      Junior Member
      • Nov 2022
      • 9

      #3

      Hello Sarahbridges,

      thank you for your answer, i check all your suggestion and nothing, But i was able to fix issue anyway.
      Problem was that all zabbix nodes were constantli writing HA status to DB, and every node were writing to local galera host. i updated proxysql to allow write only to one galera node and that fix issue, no more deadlock, no more HA restart on node01.
      but still interesting that only node01 have this issue .




      Comment

      Working...