I have a very small Zabbix setup with just 11 hosts being monitored at the moment. One of those hosts does dual-duty running as both the server and agent. All the machines are running Ubuntu 12.04 with Zabbix installed from the Ubuntu packages from zabbix.com. The machines all have plenty of cpu, ram and network connections. For example the Zabbix server has 8x3.4 GHz cpu, 32 Gig RAM and its load average rarely gets above 0.1
Till a week ago everything was running Zabbix 2.0.9 both agents and server. A week ago I upgraded just the server to 2.2.0 - using the packages form zabbix.com
Since then in the logs I get bursts for up to a couple of hours of:
Zabbix agent item [...] on host [...] failed: first network error, wait for 15 seconds
usually a few of these then
temporarily disabling Zabbix agent checks on host [..]: host unavailable
enabling Zabbix agent checks on host []: host became available
The items vary and it seems to effect all hosts I monitor.
Sometimes I'll get emails PROBLEM: Zabbix agent on ... is unreachable for 5 minutes and then after 10 minutes I'll get the OK.
Unfortunately as this seems to effect all my hosts I'll get one or more PROBLEM then OK emails for each server
While I was getting a burst of these emails I changed zabbix_server.conf and restarted it. (The rest of the fields are the defaults apart from the zabbix mysql details):
StartPollers=20
Timeout=15
UnreachablePeriod=90
UnreachableDelay=10
This at least stopped the emails and seemed to help with collecting data. However I still get a burst of emails.
In the logs I get these:
Zabbix agent item [...] on host [...] failed: first network error, wait for 10 seconds
resuming Zabbix agent checks on host [...]: connection restored
then 10 seconds later
Zabbix agent item [...] on host [...] failed: first network error, wait for 10 seconds
Zabbix agent item [...] on host [...] failed: another network error, wait for 10 seconds
resuming Zabbix agent checks on host [...]: connection restored
I've not found anything similar in the forums. I've checked the Zabbix Item Queue and its all zeros. I have no problems with network connectivity while Zabbix is reporting the network errors. But since one of the hosts being checked is really itself its unlikely its a network issue. No problems in the Zabbix Agent logs.
The Zabbix data gathering process busy graph is basically flat except for poller processes which is at 20%. Zabbix internal process busy graph has no number over 0.2
I'd really appreciate any help as to what might be causing this.
Thanks
Till a week ago everything was running Zabbix 2.0.9 both agents and server. A week ago I upgraded just the server to 2.2.0 - using the packages form zabbix.com
Since then in the logs I get bursts for up to a couple of hours of:
Zabbix agent item [...] on host [...] failed: first network error, wait for 15 seconds
usually a few of these then
temporarily disabling Zabbix agent checks on host [..]: host unavailable
enabling Zabbix agent checks on host []: host became available
The items vary and it seems to effect all hosts I monitor.
Sometimes I'll get emails PROBLEM: Zabbix agent on ... is unreachable for 5 minutes and then after 10 minutes I'll get the OK.
Unfortunately as this seems to effect all my hosts I'll get one or more PROBLEM then OK emails for each server
While I was getting a burst of these emails I changed zabbix_server.conf and restarted it. (The rest of the fields are the defaults apart from the zabbix mysql details):
StartPollers=20
Timeout=15
UnreachablePeriod=90
UnreachableDelay=10
This at least stopped the emails and seemed to help with collecting data. However I still get a burst of emails.
In the logs I get these:
Zabbix agent item [...] on host [...] failed: first network error, wait for 10 seconds
resuming Zabbix agent checks on host [...]: connection restored
then 10 seconds later
Zabbix agent item [...] on host [...] failed: first network error, wait for 10 seconds
Zabbix agent item [...] on host [...] failed: another network error, wait for 10 seconds
resuming Zabbix agent checks on host [...]: connection restored
I've not found anything similar in the forums. I've checked the Zabbix Item Queue and its all zeros. I have no problems with network connectivity while Zabbix is reporting the network errors. But since one of the hosts being checked is really itself its unlikely its a network issue. No problems in the Zabbix Agent logs.
The Zabbix data gathering process busy graph is basically flat except for poller processes which is at 20%. Zabbix internal process busy graph has no number over 0.2
I'd really appreciate any help as to what might be causing this.
Thanks
Comment