Problem Detection
As soon as data is collected, using the different methods available in Zabbix, the process of evaluating collected data begins. Data evaluation rules, or trigger expression, in terms of Zabbix, provide logical definitions of a problem state for data received from monitored hosts. When a trigger threshold is reached, the trigger changes its state from OK to PROBLEM and also back when data is below the threshold.
Prediction
While it is nice to have thresholds for problem situation detection, it would be even nicer to be able to predict problems. For that purpose predictive functions are available in Zabbix. Zabbix analyzes the trend of incoming data and constructs a forecast of how things are likely to go, giving users the ability to act proactively.
Extremely flexible threshold definitions
Zabbix provides its users with very flexible, intelligent threshold definition options. While a threshold for trigger may be as simple as "bigger than x", it is possible to use all logical expressions, such as division, multiplication, not equal, logical AND and OR.
Referencing one or multiple items or hosts
Use many different items obtained from different hosts to build a trigger expression. This allows to build very complex, intelligent thresholds, which minimize false positives and thus let administrators concentrate on real issues.
History data analysis
Check the current data status against the one obtained some time ago. It is possible to compare similar periods of time, let's say this Monday with previous Monday or this afternoon with the one 2 weeks ago. This is extremely handy when a load on environment is not uniform and comparing Monday morning to Tuesday afternoon just does not produce any valuable information.
Compare with the norm, where norm is system state in the past. For example: average CPU load for the last hour is 2x higher of the CPU load for the same period week ago.
Hysteresis
Hysteresis is a great function that allows to avoid flapping, which might occur when incoming data is fluctuating around a simple threshold. Hysteresis has upper and lower limits, which put trigger in a problem state when the upper limit is reached and return trigger to a normal state when data obtained is below the threshold.
Dependencies
In any IT environment there are plenty of dependencies, when failure in one node is responsible for the failure of operation in many other parts of the environment. Dependencies may grow to multi-level, when a lack of disk space results in failure of the OS on which database is running. At this moment users of CRM, CMS, BPMS, and many other business applications will not be able to perform their tasks. A monitoring system without dependencies configured would produce tens or hundreds of notifications and send hundreds or thousands of e-mails informing about all these systems being down. As an alternative, wise usage of the dependencies function will result in only one notification informing about the lack of disk space, while hiding all the other notifications.
Real problem: Disk is full
- Investigates the real cause of multiple problems
- Skips dependent notifications
- Hides dependent triggers in the frontend
Severity levels
Define trigger severity levels based on importancy level. Since not all triggers carry the same level of importance, one of six severity levels can be assigned to a trigger. The severity level then is applied to the visual representation of triggers, and can be used to finetune the reaction to problem events.