Hi,
I am running a zabbix proxy in passive mode with a couple of agents connected to it (the agent checks are all passive too). Everything seems to be working ok, except that occasionally some checks will get skipped and go into the queue until their next check time. The only thing I can find in the logs was this line in the zabbix_proxy.log:
Zabbix Host [hostname]: first network error, wait for 15 seconds
(and occasionally the same line with "another network error").
I tried running the proxy with Debug=4 and managed to catch an error (quite hard since they happen infrequently and the log rotates every few mins with Debug=4 enabled) but none of the surrounding debug info seems particularly enlightening:
7857:20110707:015422.521 sleeping for 1 seconds
7853:20110707:015422.521 In get_values()
7853:20110707:015422.522 In DCinit_nextchecks()
7853:20110707:015422.522 In DCconfig_get_poller_items() poller_type:0
7853:20110707:015422.522 End of DCconfig_get_poller_items():0
7853:20110707:015422.522 In DCflush_nextchecks()
7853:20110707:015422.522 End of get_values()
7853:20110707:015422.522 poller #1 spent 0.000246 seconds while updating 0 values
7853:20110707:015422.522 In DCconfig_get_poller_nextcheck() poller_type:0
7853:20110707:015422.522 End of DCconfig_get_poller_nextcheck():1310021663
7853:20110707:015422.522 sleeping for 1 seconds
7856:20110707:015422.561 Zabbix Host [hostname]: first network error, wait for 15 seconds
7856:20110707:015422.561 In zabbix_log()
7856:20110707:015422.561 In DCconfig_get_items() hostid:0 key:'zabbix[log]'
7856:20110707:015422.561 End of DCconfig_get_items():0
7856:20110707:015422.562 End of zabbix_log()
7856:20110707:015422.562 In substitute_simple_macros() data:'vfs.fs.size[/,used]'
7856:20110707:015422.562 Zabbix Host 10047 is unreachable. Skipping [vfs.fs.size[/,used]]
7856:20110707:015422.562 In DCflush_nextchecks()
7856:20110707:015422.562 End of get_values()
7856:20110707:015422.562 poller #4 spent 3.078697 seconds while updating 6 values
7856:20110707:015422.562 In DCconfig_get_poller_nextcheck() poller_type:0
7856:20110707:015422.562 End of DCconfig_get_poller_nextcheck():1310021663
7856:20110707:015422.562 sleeping for 1 seconds
This error is happening even for the agent running on the same host as the proxy, so it seems very strange to me that there would be a network error to localhost.
Any suggestions or advice would be appreciated.
I am running a zabbix proxy in passive mode with a couple of agents connected to it (the agent checks are all passive too). Everything seems to be working ok, except that occasionally some checks will get skipped and go into the queue until their next check time. The only thing I can find in the logs was this line in the zabbix_proxy.log:
Zabbix Host [hostname]: first network error, wait for 15 seconds
(and occasionally the same line with "another network error").
I tried running the proxy with Debug=4 and managed to catch an error (quite hard since they happen infrequently and the log rotates every few mins with Debug=4 enabled) but none of the surrounding debug info seems particularly enlightening:
7857:20110707:015422.521 sleeping for 1 seconds
7853:20110707:015422.521 In get_values()
7853:20110707:015422.522 In DCinit_nextchecks()
7853:20110707:015422.522 In DCconfig_get_poller_items() poller_type:0
7853:20110707:015422.522 End of DCconfig_get_poller_items():0
7853:20110707:015422.522 In DCflush_nextchecks()
7853:20110707:015422.522 End of get_values()
7853:20110707:015422.522 poller #1 spent 0.000246 seconds while updating 0 values
7853:20110707:015422.522 In DCconfig_get_poller_nextcheck() poller_type:0
7853:20110707:015422.522 End of DCconfig_get_poller_nextcheck():1310021663
7853:20110707:015422.522 sleeping for 1 seconds
7856:20110707:015422.561 Zabbix Host [hostname]: first network error, wait for 15 seconds
7856:20110707:015422.561 In zabbix_log()
7856:20110707:015422.561 In DCconfig_get_items() hostid:0 key:'zabbix[log]'
7856:20110707:015422.561 End of DCconfig_get_items():0
7856:20110707:015422.562 End of zabbix_log()
7856:20110707:015422.562 In substitute_simple_macros() data:'vfs.fs.size[/,used]'
7856:20110707:015422.562 Zabbix Host 10047 is unreachable. Skipping [vfs.fs.size[/,used]]
7856:20110707:015422.562 In DCflush_nextchecks()
7856:20110707:015422.562 End of get_values()
7856:20110707:015422.562 poller #4 spent 3.078697 seconds while updating 6 values
7856:20110707:015422.562 In DCconfig_get_poller_nextcheck() poller_type:0
7856:20110707:015422.562 End of DCconfig_get_poller_nextcheck():1310021663
7856:20110707:015422.562 sleeping for 1 seconds
This error is happening even for the agent running on the same host as the proxy, so it seems very strange to me that there would be a network error to localhost.
Any suggestions or advice would be appreciated.
Comment