Hi,
I would like to replace our Nagios, MRTG and OTRS based solutions with Zabbix.
At the moment we have three datacenters with a gigabit backbone and 350 servers each. 25% increase per year. The pre-production environment is connected with a gigabit and a 155 Mbit leased line.
To have a 24x7 monitoring without any interrupts I need to size the solution.
First I will need a Linux cluster. This may be an heartbeat, a RedHat Cluster Server, Steeleye's Lifekeeper or hp MC/Serviceguard. Two possible strategies:
The second solution would be the best one the avoid any monitoring problems on network outtages. But I would need a Zabbix frontend which is able to collect data from three or more different SQL-databases and aggregate information.
Placing the database on a shared storage may also be difficult: if the primary node crashes dirty buffers (filesystem buffers) will be deleted. So MySQL may not be the first choice, as all dirty buffers will be cached by the operating system. I thought about tuning the filesystem parameters and mount the filesystem in "sync" mode - but that is not reliable enough. So PostgreSQL or Oracle using raw devices and logs would be the better choice...?
How big do I need to size the hardware to store the most detailed values at least three months for 1000 servers? Do I need 2 CPU or 4 CPU servers? 300 GB in RAID10 mode for the database?
Has someone experiences with such big environments?
Is Zabbix able to schedule and run all items and triggers within 30 seconds?
Thanks, OLiver
I would like to replace our Nagios, MRTG and OTRS based solutions with Zabbix.
At the moment we have three datacenters with a gigabit backbone and 350 servers each. 25% increase per year. The pre-production environment is connected with a gigabit and a 155 Mbit leased line.
To have a 24x7 monitoring without any interrupts I need to size the solution.
First I will need a Linux cluster. This may be an heartbeat, a RedHat Cluster Server, Steeleye's Lifekeeper or hp MC/Serviceguard. Two possible strategies:
- Place the cluster into the pre-production - so I may use a shared storage (SCSI or FC-based) and store the database on it.
- Distribute the cluster to the datacenters and have local copies of the database on each center.
The second solution would be the best one the avoid any monitoring problems on network outtages. But I would need a Zabbix frontend which is able to collect data from three or more different SQL-databases and aggregate information.
Placing the database on a shared storage may also be difficult: if the primary node crashes dirty buffers (filesystem buffers) will be deleted. So MySQL may not be the first choice, as all dirty buffers will be cached by the operating system. I thought about tuning the filesystem parameters and mount the filesystem in "sync" mode - but that is not reliable enough. So PostgreSQL or Oracle using raw devices and logs would be the better choice...?
How big do I need to size the hardware to store the most detailed values at least three months for 1000 servers? Do I need 2 CPU or 4 CPU servers? 300 GB in RAID10 mode for the database?
Has someone experiences with such big environments?
Is Zabbix able to schedule and run all items and triggers within 30 seconds?
Thanks, OLiver
Replication drift is one thing to know, and even if the Master dies, data may be disappear.
Comment