Hi,
I played a bit with the triggers to give zabbix some kind of flap detection. Basically it is working but there is a flea that I don't understand.
How it works
Let's suppose there is an item for host flap called flap[test] with the following trigger:
({flap:flap[test].prev(0)}+{flap:flap[test].abschange(0)})>74
Values for flap[test] are being delivered through zabbix_sender for testing with the following script:
The values represent percantage use of cpu on flap and I want to be emailed if 75% or more of the cpu are being used.
What it does
The trigger stays on as long as the next but one value doesn't trigger. So we give the service some time to stabilize before assuming everything is okay. You can mitigate the situation if you append '& {flap:flap[test].last(0)}>70' to the trigger. So flapping only occurs from values 70+.
In our example that would mean, that the trigger stays of until the first 77 and stays on for the whole testing series because the trigger assumes the service is flapping.
Fleas
As I mentioned before, the trigger basically works, but everytime I follow the test via latestalarms.php the trigger becomes from time to time status unknown. And I can't figure out why. My environment is zabbix1.0 on solaris 10.
Regards,
Frank.
I played a bit with the triggers to give zabbix some kind of flap detection. Basically it is working but there is a flea that I don't understand.
How it works
Let's suppose there is an item for host flap called flap[test] with the following trigger:
({flap:flap[test].prev(0)}+{flap:flap[test].abschange(0)})>74
Values for flap[test] are being delivered through zabbix_sender for testing with the following script:
Code:
for rst in 67 77 46 79 64 77 57 76 61 76 64 78 73 76; do sleep 28 ; zabbix_sender server 10001 flap:flap[test] $rst ; done
What it does
The trigger stays on as long as the next but one value doesn't trigger. So we give the service some time to stabilize before assuming everything is okay. You can mitigate the situation if you append '& {flap:flap[test].last(0)}>70' to the trigger. So flapping only occurs from values 70+.
In our example that would mean, that the trigger stays of until the first 77 and stays on for the whole testing series because the trigger assumes the service is flapping.
Fleas
As I mentioned before, the trigger basically works, but everytime I follow the test via latestalarms.php the trigger becomes from time to time status unknown. And I can't figure out why. My environment is zabbix1.0 on solaris 10.
Regards,
Frank.
Comment