Evening all.
I've got a Zabbix 7.0.4 instance running via the official Docker instances on a vm Ubuntu 22 host with 16 gb ram and 8 cores of an epyc system to monitor circa 30 hosts.
We've now had 2 instances where Zabbix suddenly stops a polling a some of the HTTP hosts (across a series of network addresses) all at the same time after a while, but not others in the same network ranges. Initially I thought this was a network level problem, or a host problem, not a Zabbix problem because of the apparent randomness of it. In this case, the resulted in an outage of over 4 days.
Using the 'test' button on the item grabbing data the source item worked fine while the pollers didn't, with results coming in quickly, and if I hit execute now button it doesn't result in any action being taken. (I was watching the network traffic in wireshark) , even if I waited 15 minutes.
I'm the end I just rebooted the fulll Docker stack and it all came back OK.
I've not been able to get into the zabbix logs today and tbh, can't recall if they will have survived the stack restart.
I can't find any reported issues against this: has anyone else seen similar?
Ps, is there any way to monitor the state of the http pollers within a Zabbix dashboard so that if they die I can get an alert?
Ta
I've got a Zabbix 7.0.4 instance running via the official Docker instances on a vm Ubuntu 22 host with 16 gb ram and 8 cores of an epyc system to monitor circa 30 hosts.
We've now had 2 instances where Zabbix suddenly stops a polling a some of the HTTP hosts (across a series of network addresses) all at the same time after a while, but not others in the same network ranges. Initially I thought this was a network level problem, or a host problem, not a Zabbix problem because of the apparent randomness of it. In this case, the resulted in an outage of over 4 days.
Using the 'test' button on the item grabbing data the source item worked fine while the pollers didn't, with results coming in quickly, and if I hit execute now button it doesn't result in any action being taken. (I was watching the network traffic in wireshark) , even if I waited 15 minutes.
I'm the end I just rebooted the fulll Docker stack and it all came back OK.
I've not been able to get into the zabbix logs today and tbh, can't recall if they will have survived the stack restart.
I can't find any reported issues against this: has anyone else seen similar?
Ps, is there any way to monitor the state of the http pollers within a Zabbix dashboard so that if they die I can get an alert?
Ta
Comment