I've got a SQL query that shows all the failed orders in our system. It's bounded in a Python scriptlet that returns the value to Zabbix.
I'm trying to figure out the best pattern for representing the events and alerting for them.
I could have the query show all the failed orders, but that means a) a long SQL query that has to scan all the orders, and b) variability in that value when we purge data out of the table.
Because of this, I was going to have the table report the trailing value of an amount of time. I couldn't figure out whether to have the sensor show every 5 minutes, and the query scan only the last hour (meaning that I can set an alert any time the count of failed orders increases, but the problem is if I get a spike, while another spike is leaving the 30 minute window, the issue may get hidden. It's similiar if I do 'the most recent 24 hours'.
What's a good way to approach this? I'd like to be able to graph the number of errors in a time period to see when they occured, as well as set alerts when the count increases.
Thanks!
Thanks1
I'm trying to figure out the best pattern for representing the events and alerting for them.
I could have the query show all the failed orders, but that means a) a long SQL query that has to scan all the orders, and b) variability in that value when we purge data out of the table.
Because of this, I was going to have the table report the trailing value of an amount of time. I couldn't figure out whether to have the sensor show every 5 minutes, and the query scan only the last hour (meaning that I can set an alert any time the count of failed orders increases, but the problem is if I get a spike, while another spike is leaving the 30 minute window, the issue may get hidden. It's similiar if I do 'the most recent 24 hours'.
What's a good way to approach this? I'd like to be able to graph the number of errors in a time period to see when they occured, as well as set alerts when the count increases.
Thanks!
Thanks1
Comment