I can't figure out why I keep getting trigger flapping on a CPU trigger, even after implementing hysteresis. We have some firewall devices that are known to have frequent high CPU utilization. I have watched and verified this myself and tried to create the triggers for the team in such a way that they will create an event with legitimate sustained high CPU, but in most cases, when the firewalls spike and drop, I am still getting events and alerts. I can't figure out why. Here is how I have things configured (these are three separate triggers by the way):
({TRIGGER.VALUE}=0 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(15)}>75) or ({TRIGGER.VALUE}=1 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(20)}>60)
({TRIGGER.VALUE}=0 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(15)}>90) or ({TRIGGER.VALUE}=1 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(20)}>75)
({TRIGGER.VALUE}=0 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(15)}>95) or ({TRIGGER.VALUE}=1 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(20)}>90)
The following screen shot is the CPU utilization of all firewall devices in one host group for three hours:

I have tried various trigger functions (last, avg, min), various values for the number of values to include in the evaluation (min(5), avg(10)) etc) and none seem to do the trick. Adding dependencies seems to quiet the events down some, but only insofar as we only get one alert rather than three for each host.
One final note, I have verified the type of data coming from the hosts as well. It comes across as a whole integer (14 = 14%) so there is not preprocessing needed on the incoming data.
({TRIGGER.VALUE}=0 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(15)}>75) or ({TRIGGER.VALUE}=1 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(20)}>60)
({TRIGGER.VALUE}=0 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(15)}>90) or ({TRIGGER.VALUE}=1 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(20)}>75)
({TRIGGER.VALUE}=0 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(15)}>95) or ({TRIGGER.VALUE}=1 and {Template SNMP PAN Firewalls:hrProcessorLoad1.min(20)}>90)
The following screen shot is the CPU utilization of all firewall devices in one host group for three hours:
I have tried various trigger functions (last, avg, min), various values for the number of values to include in the evaluation (min(5), avg(10)) etc) and none seem to do the trick. Adding dependencies seems to quiet the events down some, but only insofar as we only get one alert rather than three for each host.
One final note, I have verified the type of data coming from the hosts as well. It comes across as a whole integer (14 = 14%) so there is not preprocessing needed on the incoming data.
Comment