Hi there,
I have two Zabbix 2.0.4 servers in a Master/Child node configuration monitoring about 11 web hosts currently.
I have set up an SLA for each of these servers on the Master node Zabbix server which checks for agent unreachable from both the master and child nodes.
I've had this up and running for a few months, however, now I'm finding that servers are starting to display 100% problem time, even if they have no problems and their status is reported as OK.
Here's a screen capture of the 100% problem time, but with status OK:

And here's a screen capture of where it went wrong:

I've been trying all sorts over the last couple of weeks with the Zabbix interface, but nothing short of deleting the SLA and setting it up again will get it working again.
We want to use this for business planning but if this is what we're going to experience at random and irregular intervals we can't rely on the data from it.
Has anyone else seen this?
I have two Zabbix 2.0.4 servers in a Master/Child node configuration monitoring about 11 web hosts currently.
I have set up an SLA for each of these servers on the Master node Zabbix server which checks for agent unreachable from both the master and child nodes.
I've had this up and running for a few months, however, now I'm finding that servers are starting to display 100% problem time, even if they have no problems and their status is reported as OK.
Here's a screen capture of the 100% problem time, but with status OK:

And here's a screen capture of where it went wrong:

I've been trying all sorts over the last couple of weeks with the Zabbix interface, but nothing short of deleting the SLA and setting it up again will get it working again.
We want to use this for business planning but if this is what we're going to experience at random and irregular intervals we can't rely on the data from it.
Has anyone else seen this?
Comment