I have an HTTP Agent item monitoring a URL with pre-processing configured. The item is returning the header and body as character data. The pre-processing is using a regular expression to evaluate the response down to OK with a custom on fail value set to ERROR. This is working as expected. When I look at the history for this item it contains mostly OK values with an occasional ERROR value. There is some flakiness in the back-end systems which cause an occasional error. Since I know that there will be a single random failure, I have a trigger that I've configured to count the number of error responses over the last three collections and evaluate to true if there have been 2 or more values equal to ERROR over those last three collections.
This is what my trigger expression looks like : {zabbix.sys.courtnet.org:vm-esig-appp2_HSM_Pool_Health.count(#3,"OK",ne)}>1
I have also tried : {zabbix.sys.courtnet.org:vm-esig-appp2_HSM_Pool_Health.count(#3,"ERROR",eq)}>1
What happens is that this trigger seems to fire every time the item gets an ERROR value.
Here is the data values collected from this item (from the text view of the latest data for this item).
2020-04-21 11:45:18 1587483918 "OK"
2020-04-21 11:44:18 1587483858 "OK"
2020-04-21 11:43:18 1587483798 "ERROR"
2020-04-21 11:42:18 1587483738 "OK"
2020-04-21 11:41:18 1587483678 "OK"
2020-04-21 11:40:18 1587483618 "OK"
I received the alert emails at 11:44 within a couple seconds of each other....
Problem started at 11:44:14 on 2020.04.21 Problem has been resolved at 11:44:16 on 2020.04.21 If I'm reading the documentation correct, then I would have assumed that trigger would not have fired because there was only every one ERROR status within any three collections of the item data.
What am I missing here?
Thanks,
Cliff
This is what my trigger expression looks like : {zabbix.sys.courtnet.org:vm-esig-appp2_HSM_Pool_Health.count(#3,"OK",ne)}>1
I have also tried : {zabbix.sys.courtnet.org:vm-esig-appp2_HSM_Pool_Health.count(#3,"ERROR",eq)}>1
What happens is that this trigger seems to fire every time the item gets an ERROR value.
Here is the data values collected from this item (from the text view of the latest data for this item).
2020-04-21 11:45:18 1587483918 "OK"
2020-04-21 11:44:18 1587483858 "OK"
2020-04-21 11:43:18 1587483798 "ERROR"
2020-04-21 11:42:18 1587483738 "OK"
2020-04-21 11:41:18 1587483678 "OK"
2020-04-21 11:40:18 1587483618 "OK"
I received the alert emails at 11:44 within a couple seconds of each other....
Problem started at 11:44:14 on 2020.04.21 Problem has been resolved at 11:44:16 on 2020.04.21 If I'm reading the documentation correct, then I would have assumed that trigger would not have fired because there was only every one ERROR status within any three collections of the item data.
What am I missing here?
Thanks,
Cliff
Comment