Now that 1.6 is out and things are looking really good so far (congrats Zabbix Team!!), I have been asked to research the Distributed Monitoring.
We did some work with someone whose primary office was in Houston, Texas and it is now gone (and I do mean gone; Hurricane Ike left them nothing usable behind). That got us thinking. We don't wish for that kind of disaster on anyone, but we need to have some kind of plan in case it does happen to us. So I am trying to plan out what happens in case of that kind of emergency.
Currently we have 1 Zabbix server for development, 1 for testing/staging (new systems that we have some doubt about), and 1 server for all of our production systems. This works but it is not ideal because we have 2 off site locations that report back to the production server. When we have a connection failure, many triggers turn on. In a case of an emergency where the primary zabbix servers are permanently offline, I would have to build a new server and redirect all of the systems from the other locations.
Looking at the Distributed Monitoring solution it appears that I can have a multiple zabbix servers in a very nice layout. So I could have a Master Zabbix server in city F with a Node Zabbix Servers monitoring systems in city D, city F, and City T.
If I understand correctly, Node_D and Node_T will still gather information and trigger on the systems they still monitor even if the link between them and the Master is broken. But what happens if the Master goes away (as in not coming back because of an emergency, or it caught fire, or got hit with a hammer, ect, ect)?
From what I understand the Nodes will all continue to act as Zabbix servers, but how difficult would it be to make one of the nodes a Master? Will there still be any communication between the Nodes?
If the worst case scenario is that the building is gone, I obviously no longer care so much about the historical data of a servers hard drive usage.
What I would need to be able to do, is quickly access the remaining nodes, turn off any triggers complaining about the Master being gone, and have an exact idea of the state in which my remaining servers are in. The backup scripts, for example, already report to Zabbix and that is prime information I would be interested in. 'When was the last successful backup of serverX?' If there truly is an emergency, then I am sure there will be plenty of other things to panic about so I don't really care to be concerned with alerts from Zabbix on topics I already know about

Of course, there is the possibility that each city has its own Zabbix server and they are not linked via the Master-Node relationship, but that cuts out a lot of the cool reporting/management tools that the Master-Node relationship provides.
I know there are a lot of people who appear to be using Distributed Monitoring so I ask you, How do you deal with a disaster event in which a section of your servers are no longer reachable?
Thanks for any input!
~S~
We did some work with someone whose primary office was in Houston, Texas and it is now gone (and I do mean gone; Hurricane Ike left them nothing usable behind). That got us thinking. We don't wish for that kind of disaster on anyone, but we need to have some kind of plan in case it does happen to us. So I am trying to plan out what happens in case of that kind of emergency.
Currently we have 1 Zabbix server for development, 1 for testing/staging (new systems that we have some doubt about), and 1 server for all of our production systems. This works but it is not ideal because we have 2 off site locations that report back to the production server. When we have a connection failure, many triggers turn on. In a case of an emergency where the primary zabbix servers are permanently offline, I would have to build a new server and redirect all of the systems from the other locations.
Looking at the Distributed Monitoring solution it appears that I can have a multiple zabbix servers in a very nice layout. So I could have a Master Zabbix server in city F with a Node Zabbix Servers monitoring systems in city D, city F, and City T.
Code:
- Node_D
Master_F - Node_F
- Node_T
From what I understand the Nodes will all continue to act as Zabbix servers, but how difficult would it be to make one of the nodes a Master? Will there still be any communication between the Nodes?
If the worst case scenario is that the building is gone, I obviously no longer care so much about the historical data of a servers hard drive usage.
What I would need to be able to do, is quickly access the remaining nodes, turn off any triggers complaining about the Master being gone, and have an exact idea of the state in which my remaining servers are in. The backup scripts, for example, already report to Zabbix and that is prime information I would be interested in. 'When was the last successful backup of serverX?' If there truly is an emergency, then I am sure there will be plenty of other things to panic about so I don't really care to be concerned with alerts from Zabbix on topics I already know about

Of course, there is the possibility that each city has its own Zabbix server and they are not linked via the Master-Node relationship, but that cuts out a lot of the cool reporting/management tools that the Master-Node relationship provides.
I know there are a lot of people who appear to be using Distributed Monitoring so I ask you, How do you deal with a disaster event in which a section of your servers are no longer reachable?
Thanks for any input!
~S~