Ad Widget

Collapse

HA behavior, when DB connection is lost?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • madis
    Junior Member
    • Nov 2021
    • 9

    #1

    HA behavior, when DB connection is lost?

    In the Zabbix native HA solution, every node uses the same database. It is written in manual, that the active node monitors its own database connectivity - if it is lost for more than failover delay, it must stop all processing and switch to standby mode. Also, each standby node monitors the last access time of the active node. If the last access time of the active node is over 'failover delay' seconds, the standby node switches itself to be the active node and assigns 'unavailable' status to the previously active node.​

    Imagine a situation, where node-01 (active) is in the same location as the database, and that site would lose the internet connection. A node-02 is at a remote site, and since the active node becomes unavailable, it switches to active. But as the DB is also unavailable, what will it do? Will it be able to send a notification?

  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    #2
    Well.. that node02 cannot check active node last access time as DB is gone... and it cannot update its own time as DB is gone... so basically it cannot do anything. I don't really know, will it try to get active, but even if it does, then all the processes will scream about lost DB connection and nothing happens.. Maybe crashes..

    Comment

    • madis
      Junior Member
      • Nov 2021
      • 9

      #3
      Thank you, cyber! What would be the best practice solution then for such scenario? 2+n connected sites, each with own services, monitored by Zabbix? It is a real life case, where a firewall crashed and Zabbix behind it was unable to notify about it.

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4806

        #4
        Sorry, I am no architect .. I don't know the good solution for you .. But this scenario where you still have DB in one location and servers in separate ones does not really provide you with full HA anyway..
        And it is always a question, who watches the watchers..

        Comment

        • Semiadmin
          Senior Member
          • Oct 2014
          • 1625

          #5
          A standby node gets active node last access time from DB, not from active node directly. If standby node lost access to DB then it can't decide to become an active node.

          Comment

          • madis
            Junior Member
            • Nov 2021
            • 9

            #6
            Is the only (and best) solution then to have one instance of Zabbix monitoring everything there is, and the second instance, in a different location, monitoring the "main Zabbix"?
            I believe another option would be adding HA for database?

            Comment

            • Semiadmin
              Senior Member
              • Oct 2014
              • 1625

              #7
              I don't quite understand the problem. Need to see that the standby node has lost connection with the DB?
              Just monitor the cluster by internal check zabbix[cluster,discovery,nodes] and you see how this node will change status from standby to unavailable.​

              Comment

              • cyber
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Dec 2006
                • 4806

                #8
                I guess the reason for this question was, that in case that main Zabbix instance together with DB stays behind the crashed FW, then other node is not able to start up and notify about situation... which is kind of natural in this state, as DB is gone and no actions cannot be taken. Either you need some all-in-one small Zabbix appliance, that monitors from outside or you create real DB HA there, so your second host can actually start up.. or you set up somekind of 3rd, heartbeat based, system, which is able to scream at you, when some piece has not responded for awhile

                Comment

                • Semiadmin
                  Senior Member
                  • Oct 2014
                  • 1625

                  #9
                  And a properly working active node is not able to send an alert?
                  P.S. This does not mean that I am against monitoring the cluster and database servers by another zabbix server. It just seems to me that in this situation, the internal self-monitoring capabilities of the cluster have not yet been exhausted.
                  Last edited by Semiadmin; 09-11-2023, 20:13.

                  Comment

                  • cyber
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Dec 2006
                    • 4806

                    #10
                    Originally posted by Semiadmin
                    And a properly working active node is not able to send an alert?
                    When its behind completely dead FW and feel itself very lonely in the corner of the datacenter.. I guess it cannot, according to topic starter...

                    Comment

                    Working...