Ad Widget

Collapse

Database HA cluster necessary for critical production?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • quackduck
    Member
    • Feb 2025
    • 36

    #1

    Database HA cluster necessary for critical production?

    Got a couple of zabbix servers running. Each server monitors a couple of thousands of hosts.

    All the zabbix servers share a InnoDB Cluster.

    There's plenty of work involved with keeping a cluster updated and it can be complicated to restore, backup or komplex, risky and slow to fix when it breaks.

    For simpler deployment and maintainability, isolation and to make backups easier and faster via my VM manager (can take snapshots), I'm considering running zabbix-servers in their own VMs with a local database on the same VM as the server (alternatively a separate database VM per zabbix-server). A clone can be updated (Zabbix, OS, database and database software), tested and then swapped to.

    A separate VM for the database is recommended, I do understand that.

    But do I really need a database cluster? In my experience, the odds are extremely small that a InnoDB just krasches. And if it does, systemd just restarts it. VM rarely fails since I have a robust VM cluster already that balances hardware, shifts RAM if a RAM-module breaks and moves data between disks when disks starts to fail.

    Backups are handled - and databases are made to survive "krasches", so a snapshot of the disk should work as a solid backup and be easy to restore.

    Is maintaining a HA database cluster just adding a risk of something more complex breaking or being hard to fix?

    This is production environment which speaks for HA. Seems like a no brainer. Yet, I have a pretty solid VM environment that's quite stable.

    1-5h downtime per year due to an actual krasch is acceptable.
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    #2
    Creating snapshots of VM-s without freezing them might lead to nonusable snapshot. Freezing it , is a interruption to work... Doing it without freezing, might not succeed at all due to size, for example... Could not finish before data on disk already changed..
    but I am not a virtual infra expert, you might have better knowledge on that..

    HA is not just a "crash safety".. you may have network interruptions, killing one side, but you can still go on with work if DB survives... But again, it all depends on your infra.. If your server is single instance, it still might not work out

    Downtime per year and downtime per day/week/month are also very different numbers... 5h yearly is ~99.94% uptime... 1h per year ~99.9886% .. Casually allowing so high numbers.. https://uptime.is/99.9886

    Comment

    Working...