Ad Widget

**timbo** · 13-11-2014, 05:44

I believe this gentleman has done something similar:

Scalable Zabbix - Lessons on hitting 9400 NVPS - Zabbix Blog

http://blog.zabbix.com/scalable-zabbix-lessons-on-hitting-9400-nvps/2615/

One of the questions for those of us that use Zabbix on a large scale is “Just how much data can Zabbix ingest before it blows up spectacularly?” Some of the work I’ve been doing lately revolves around that question. I have an extremely large environment (around 32000+ devices) that could potentially be monitored entirely […]

I think the gist was you cannot have a "slave" service/server. The "slave" Zabbix service needs to be stopped, then started when the "primary" Zabbix service/server is down (no heartbeat). I think he developed some scripts to automate this.

It has been a while since I've visited the IRC channel, but I'm pretty sure I saw him in there a couple of times (though his IRC handle escapes me at the moment).

Hope that gives you a litte more info to work off of.

I would love to hear the progress you make on this, please keep us posted.

-Timbo

**innovot** · 21-12-2014, 16:23

Very interested to follow this thread as we have a similar requirement. We have two r420 at disparate locations with iSCSI storage that is capable of being replicated between sites. If the DB and configuration files were synchronized between the two then would both be able to run ? As we run an IPSEC tunnel between the two we would want to have site A monitoring site A, site B and monitoring B, but Site A monitoring key site B systems and vice versa. Then we would know if either site were to fail but still retain a single set of configuration and data. Is that possible ?

**mmester** · 05-01-2015, 19:55

I am also interested in doing this. We have a very large environment with multiple datacenters. Putting a critical system in production without HA is not an option. We currently partition each monitoring install per datacenter location. We lose the centralization capability there, but also limit our exposure to a failure.

It would be nice to centralize the system as a whole and use proxies at the remote datacenters. To do that we need HA for the central servers though.

-Mike

**syndeysider** · 13-02-2015, 02:36

Hi Guys

So I've finally completed the setup of a Redhat 7 (pacemaker/corosync) cluster.

Active

mysqld - master
zabbix-server - master
dbmaster-vip
symlinked cronjobs confs etc. from git repo

Passive

mysql - slave
dbreader-vip

I am about to start the migration later this month and will write-up a "how to" blog post on the issues I encountered etc. It wasn't easy to start off with as I used pcs instead of crm and there are not as many tutorials out there for the new commands structure etc.

I also ran into some database structure issues e.g. I got two different partitions of which I wanted the history/trends etc. tables to run off the SSD's and the rest of the DB to run of the SAS disks. I had to create new tables and migrate data to the new partition structure etc. Had to also update how I managed table partitioning.

I've tested fail over (30 second outage) and "large load of queries" and my master/slave handle fine.

I'll try complete this by end of February and let you know how I go.

**timbo** · 16-02-2015, 02:40

Thanks for updating us on your progress.

I look forward to seeing a few more details on this project when you have the time (as I may be in the same boat before too long).

-Timbo

**wdijkerman** · 31-03-2015, 20:09

I don't know if this still applies, but the setup you mentioned is also used in an complete chapter in this book: Mastering Zabbix (PacktPub).

(This post is not used for spam purpose.) I bought this book a while ago and what I can remember, this is written very clear and understandable. Maybe this will help you configure your environment.

**mushero** · 15-11-2015, 15:33

We are building this but in two locations, in Shanghai and Beijing.

First step is get all monitoring on proxies as then our field hosts (in hundreds of locations) don't need to be touched for the public IPs they talk to (they are locked in iptables and the agent config, most years ago).

Second, build a failover site with a full stack of Zabbix. The Web & Apps must not be running, but startable, and the DB is in slave mode to the master.

Third, we fail manually for now by starting the DR Web/App, breaking replication, and then manually changing all the proxies to the RD app server. Not pretty but easy to understand and do.

Of course all of this can be automated, but this system has run 7 years without real issues so not a very common occurrence and we can do it in the day time in an hour once the real issues are confirmed - of course could automate to a few minutes or less if you wanted.

Key is not auto failing that DB faster than you can confirm you've really lost the master or else you have a 1TB DB to sync back up over the internet from backups, which is not that much fun.

Steve

Ad Widget

Zabbix Multi Site HA

Zabbix Multi Site HA

Comment

Comment

Comment

Comment

Comment

Comment

Comment