We have been running Zabbix for 6+ years and it's immeasurably useful and flexible. We are however unfortunately in a country with load shedding, where frequent power disruptions affect our ability to accurately separate link monitoring events from sites where clients do not have appropriate backup power generation or UPS systems.
I was hoping to find a way to either periodically, say at 10pm, to run through a list of problems and then try to correlate what the router's SNMP polled uptime was when recovery occurred. If uptime was less than 10 minutes when the device's unreachable state was cleared then simply delete the table entry of the down time event.
Any tips on which table I should start digging around in, hoping that there is one somewhere which details the device ID, start time of events relating to the ICMP unreachable test and then remove events when the downtime related to the device being unavailable due to environmental factors in the client's control.
I was hoping to find a way to either periodically, say at 10pm, to run through a list of problems and then try to correlate what the router's SNMP polled uptime was when recovery occurred. If uptime was less than 10 minutes when the device's unreachable state was cleared then simply delete the table entry of the down time event.
Any tips on which table I should start digging around in, hoping that there is one somewhere which details the device ID, start time of events relating to the ICMP unreachable test and then remove events when the downtime related to the device being unavailable due to environmental factors in the client's control.