Hi,
I have zabbix 1.8.10 running on servers at a couple of locations. I am very happy that zabbix is monitoring hosts and connectivity at those locations and generating great stats and alerts... My concern is that I loose all that security if for any reason the zabbix service or the host OS lock up.
What methods are others using to get alerts if Zabbix itself dies. eg harddrive full on host server, Critical hardware failure on zabbix host server etc.
Ideas occurring to me are:
1) run a cron job on another server to ssh onto zabbix server and check timestamp of the zabbix_server log
2) get 2 zabbix servers to watch each other : eg I have zabbix server A at location 1 and zabbix server B at location 2. Set zabbix_agentd.conf on A to use Server=2 and zabbix_agentd.conf on B to use Server=1.
3) something clever with distributed monitoring
the first option seems a bit flaky. I am not sure if option 2 will disrupt any of the zabbix_servers internal status monitoring.. And option 3 (distributed monitoring) is hierarchical so the problem just moves up the tree - who monitors the top level zabbix server ...
Any comments please?
I have zabbix 1.8.10 running on servers at a couple of locations. I am very happy that zabbix is monitoring hosts and connectivity at those locations and generating great stats and alerts... My concern is that I loose all that security if for any reason the zabbix service or the host OS lock up.
What methods are others using to get alerts if Zabbix itself dies. eg harddrive full on host server, Critical hardware failure on zabbix host server etc.
Ideas occurring to me are:
1) run a cron job on another server to ssh onto zabbix server and check timestamp of the zabbix_server log
2) get 2 zabbix servers to watch each other : eg I have zabbix server A at location 1 and zabbix server B at location 2. Set zabbix_agentd.conf on A to use Server=2 and zabbix_agentd.conf on B to use Server=1.
3) something clever with distributed monitoring
the first option seems a bit flaky. I am not sure if option 2 will disrupt any of the zabbix_servers internal status monitoring.. And option 3 (distributed monitoring) is hierarchical so the problem just moves up the tree - who monitors the top level zabbix server ...
Any comments please?
Comment