Hi
I am starting to roll out my first zabbix installation, but I have run into a annoying issue event though I have as few as 5-6 agents.
The agents can be running fine until suddenly the server decides it is unreachable and and tripping the "no data from agent.ping for 5 minutes" trigger. Once that happens I cannot brung back the host.(even after days) I have tried restarting the agent and disable/enable the host on the server neither of which worked.
The only recourse is to delete and recreate the hosts.
Zabbix server: 2.0.3 on RHEL 6 with mysql
Zabbix server hardware: Dual socket 6-core Xeon, not taxed at all
New values per second: <10
Zabbix agents (5x): 2.0.3 on RHEL 5
Agent checks: only passive. Only using the standard "Template OS Linux" which comes out of the box.
In the logs I see "<some key> failed- First network error, will try again after 15 seconds" but that is it.
I have tried enabling the debug log on the server and I see some statements like "Failed to evaluate function in expression agent.ping[{HOST.NAME}].nodata(5m)...." and without debug logging
Tonight I set the Timeout up to 30 seconds, but not sure if that will fix it
I have tried googling the subject but all the similar problems I found was with version 1.8.x series and looked quite old.
How to debug/fix??
I am starting to roll out my first zabbix installation, but I have run into a annoying issue event though I have as few as 5-6 agents.
The agents can be running fine until suddenly the server decides it is unreachable and and tripping the "no data from agent.ping for 5 minutes" trigger. Once that happens I cannot brung back the host.(even after days) I have tried restarting the agent and disable/enable the host on the server neither of which worked.
The only recourse is to delete and recreate the hosts.
Zabbix server: 2.0.3 on RHEL 6 with mysql
Zabbix server hardware: Dual socket 6-core Xeon, not taxed at all
New values per second: <10
Zabbix agents (5x): 2.0.3 on RHEL 5
Agent checks: only passive. Only using the standard "Template OS Linux" which comes out of the box.
In the logs I see "<some key> failed- First network error, will try again after 15 seconds" but that is it.
I have tried enabling the debug log on the server and I see some statements like "Failed to evaluate function in expression agent.ping[{HOST.NAME}].nodata(5m)...." and without debug logging
Tonight I set the Timeout up to 30 seconds, but not sure if that will fix it
I have tried googling the subject but all the similar problems I found was with version 1.8.x series and looked quite old.
How to debug/fix??
Comment