I'm looking to implement some high availability/failover for a standalone Zabbix 5.0 box across two geographically separate datacentres. This is for a small-medium business with several hundred servers.
A full clustered setup is maybe a little overcomplex for our business requirements (although not out of the question if the benefits justify it), so I think we're potentially looking at two main options:
- Primary Zabbix box in prod DC with an external replication mechanism to pre-prod DC in the event of issues, likely using Veaam.
- Simply running two instances of the Zabbix server in tandem, one at each DC, with the one in pre-prod in maintenance mode, with agents reporting in to both.
It seems to me that option 2 might the simplest in terms of failover (as well as maintaining availability of historic data), as the web interfaces would be load balanced/available although either location, although it would generate more traffic across our network links both ways (the Zabbix box monitors servers in both locations).
Ideally we would however want the other server to come out of maintenance mode for alerting purposes automatically if the main box went down.
Can anybody suggest any pros and cons to either of the above approaches, or suggest any better alternatives I may not have considered?
Thanks.
Comment