Hello,
I am discovering Zabbix today, and I have some questions that the website could not answer.
First of all, as an introduction : the network to monitor is a large network (200 servers and growing, 10 to 30 metrics to check on each -we'll have to develop specific agents to check some metrics-) organized in comprehensive clusters, themselves made of sub-clusters, and so on... to final hosts (which are usually Xen virtual machines).
1/ Can a Zabbix probe handle flaps ? As an example : the CPU load of a system is constant around 4, it rapidly raises to 15 and goes back to 4. Will Zabbix generate an error immediatly, send an SMS and wake me up at 4 a.m. ? Or will Zabbix probe again CPU load to check if it keeps high before considering it as a problem ?
Flaps may be also due to problems in getting the probe value at a precise instant, but may go back to normal the minute after...
2/ Can Zabbix be configured to aggregate hosts and groups of hosts on a multi level basis. Example : I have a web server, in a group of web servers, the groups of web servers are in a cluster, and clusters are in a country.
I want to be able to have a very quick look at a single page to check if everything is OK in a country, and if a problem occurs, be able to drill-down exactly to where the problem is.
I read on the website that it is possible to aggregate but not sure if it is possible on a multi level basis.
3/ Is it possible to tell Zabix to keep silent during the first n minutes of live of a system. Usually at startup, the disk usage, the cpu load will be high, so is it possible to tell Zabix to ignore the problems during the first n minutes after system startup ?
4/ Is it possible to swith off an indicator ? Example : a disk in a raid array is broken. There is still a spare disk in the array. I scheduled to change it next week because I planned a maintenance at the datacenter.
But before the maintenance, I would like to switch this indicator off to keep a high level supervision of my network and be able to quickly see if an incident occured without having to drill down because an alert is still present on this disk drive...
Thanks in advance for all the information you will bring to me !
LT
I am discovering Zabbix today, and I have some questions that the website could not answer.
First of all, as an introduction : the network to monitor is a large network (200 servers and growing, 10 to 30 metrics to check on each -we'll have to develop specific agents to check some metrics-) organized in comprehensive clusters, themselves made of sub-clusters, and so on... to final hosts (which are usually Xen virtual machines).
1/ Can a Zabbix probe handle flaps ? As an example : the CPU load of a system is constant around 4, it rapidly raises to 15 and goes back to 4. Will Zabbix generate an error immediatly, send an SMS and wake me up at 4 a.m. ? Or will Zabbix probe again CPU load to check if it keeps high before considering it as a problem ?
Flaps may be also due to problems in getting the probe value at a precise instant, but may go back to normal the minute after...
2/ Can Zabbix be configured to aggregate hosts and groups of hosts on a multi level basis. Example : I have a web server, in a group of web servers, the groups of web servers are in a cluster, and clusters are in a country.
I want to be able to have a very quick look at a single page to check if everything is OK in a country, and if a problem occurs, be able to drill-down exactly to where the problem is.
I read on the website that it is possible to aggregate but not sure if it is possible on a multi level basis.
3/ Is it possible to tell Zabix to keep silent during the first n minutes of live of a system. Usually at startup, the disk usage, the cpu load will be high, so is it possible to tell Zabix to ignore the problems during the first n minutes after system startup ?
4/ Is it possible to swith off an indicator ? Example : a disk in a raid array is broken. There is still a spare disk in the array. I scheduled to change it next week because I planned a maintenance at the datacenter.
But before the maintenance, I would like to switch this indicator off to keep a high level supervision of my network and be able to quickly see if an incident occured without having to drill down because an alert is still present on this disk drive...
Thanks in advance for all the information you will bring to me !
LT

Comment