Our organization use Zabbix as main monitoring solution. Zabbix server monitors about 100 hosts directly and some of the boxes(~15-20) are monitored by zabbix-proxy.
We are experiencing very strange behavior of zabbix-proxy. When we start proxy then main server begin to get actual data from hosts behind the proxy. Everything goes well for sometime, but in a few hours later "last check" time for proxied hosts begin to lag behind the actual time. At proxy start the lag time is 1-2 minuts (it's good), but in 24h lag time grows up to 20-25 minutes and so on. Restart of the proxy solves a problem for a while...
I've didn't find solution of exact problem at this forum so i decided to start my own thread. Changelog for newer zabbix 1.8 doesn't include such bugfix.
Zabbix-server version is 1.6.6
/etc/zabbix/zabbix_server.conf
Zabbix-proxy version 1.6.5
Zabbix agents versions are 1.4.6(centos distr) and 1.6.8 (win)
Centos conf
Notice: DNS doesn't work at proxied hosts(can resolve current hostname from /etc/hosts), but works on host with zabbix-proxy.
Proxy log after start
I know that there are a lot of incorrect keys and triggers, but i dont'd think that this is a source of a problem.
Zabbix agents on centos puts that string to log:
But parameters are monitored. I don't know what to do to stop this message.
Help me please how to solve my problems.
We are experiencing very strange behavior of zabbix-proxy. When we start proxy then main server begin to get actual data from hosts behind the proxy. Everything goes well for sometime, but in a few hours later "last check" time for proxied hosts begin to lag behind the actual time. At proxy start the lag time is 1-2 minuts (it's good), but in 24h lag time grows up to 20-25 minutes and so on. Restart of the proxy solves a problem for a while...
I've didn't find solution of exact problem at this forum so i decided to start my own thread. Changelog for newer zabbix 1.8 doesn't include such bugfix.
Zabbix-server version is 1.6.6
/etc/zabbix/zabbix_server.conf
############ GENERAL PARAMETERS #################
#NodeID=0
StartPollers=32
StartIPMIPollers=10
StartPollersUnreachable=3
StartTrappers=32
StartPingers=20
#StartDiscoverers=1
#StartHTTPPollers=1
#ListenPort=10051
#ListenIP=127.0.0.1
#HousekeepingFrequency=1
SenderFrequency=30
#DisableHousekeeping=1
DebugLevel=3
Timeout=10
#TrapperTimeout=5
#UnreachablePeriod=45
#UnavailableDelay=15
#UnavailableDelay=60
PidFile=/var/run/zabbix-server/zabbix_server.pid
LogFile=/var/log/zabbix-server/zabbix_server.log
#LogFileSize=1
AlertScriptsPath=/etc/zabbix/alert.d/
#FpingLocation=/usr/sbin/fping
#PingerFrequency=60
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=passwd
#DBSocket=/tmp/mysql.sock
#NodeID=0
StartPollers=32
StartIPMIPollers=10
StartPollersUnreachable=3
StartTrappers=32
StartPingers=20
#StartDiscoverers=1
#StartHTTPPollers=1
#ListenPort=10051
#ListenIP=127.0.0.1
#HousekeepingFrequency=1
SenderFrequency=30
#DisableHousekeeping=1
DebugLevel=3
Timeout=10
#TrapperTimeout=5
#UnreachablePeriod=45
#UnavailableDelay=15
#UnavailableDelay=60
PidFile=/var/run/zabbix-server/zabbix_server.pid
LogFile=/var/log/zabbix-server/zabbix_server.log
#LogFileSize=1
AlertScriptsPath=/etc/zabbix/alert.d/
#FpingLocation=/usr/sbin/fping
#PingerFrequency=60
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=passwd
#DBSocket=/tmp/mysql.sock
############ GENERAL PARAMETERS #################
Server=/correct and checkes server name/
ServerPort=10051
Hostname=/correct proxy hostname/
StartPollers=5
#StartIPMIPollers=0
StartPollersUnreachable=5
#StartTrappers=5
StartPingers=5
#StartDiscoverers=1
#StartHTTPPollers=1
#ListenPort=10051
#SourceIP=
#ListenIP=127.0.0.1
#HeartbeatFrequency=60
ConfigFrequency=180
HousekeepingFrequency=1
#SenderFrequency=30
#ProxyLocalBuffer=2
ProxyOfflineBuffer=2
DebugLevel=3
Timeout=5
#TrapperTimeout=5
#UnreachablePeriod=45
#UnavailableDelay=15
#UnavailableDelay=60
PidFile=/var/run/zabbix-proxy/zabbix_proxy.pid
LogFile=/var/log/zabbix-proxy/zabbix_proxy.log
#LogFileSize=1
AlertScriptsPath=/home/zabbix/bin/
#ExternalScripts=/etc/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
#Fping6Location=/usr/sbin/fping6
#TmpDir=/tmp
#PingerFrequency=60
DBHost=localhost
DBPassword are ignored.
DBName=zabbix_proxy
DBUser=zabbix_proxy
DBPassword=passwd
#DBSocket=/tmp/mysql.sock
Server=/correct and checkes server name/
ServerPort=10051
Hostname=/correct proxy hostname/
StartPollers=5
#StartIPMIPollers=0
StartPollersUnreachable=5
#StartTrappers=5
StartPingers=5
#StartDiscoverers=1
#StartHTTPPollers=1
#ListenPort=10051
#SourceIP=
#ListenIP=127.0.0.1
#HeartbeatFrequency=60
ConfigFrequency=180
HousekeepingFrequency=1
#SenderFrequency=30
#ProxyLocalBuffer=2
ProxyOfflineBuffer=2
DebugLevel=3
Timeout=5
#TrapperTimeout=5
#UnreachablePeriod=45
#UnavailableDelay=15
#UnavailableDelay=60
PidFile=/var/run/zabbix-proxy/zabbix_proxy.pid
LogFile=/var/log/zabbix-proxy/zabbix_proxy.log
#LogFileSize=1
AlertScriptsPath=/home/zabbix/bin/
#ExternalScripts=/etc/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
#Fping6Location=/usr/sbin/fping6
#TmpDir=/tmp
#PingerFrequency=60
DBHost=localhost
DBPassword are ignored.
DBName=zabbix_proxy
DBUser=zabbix_proxy
DBPassword=passwd
#DBSocket=/tmp/mysql.sock
Centos conf
Server=::ffff:192.168.125.178,192.168.125.178
ServerPort=10051
Hostname=/correct hostname/
ListenPort=10050
StartAgents=5
DebugLevel=3
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
Timeout=3
ServerPort=10051
Hostname=/correct hostname/
ListenPort=10050
StartAgents=5
DebugLevel=3
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
Timeout=3
Proxy log after start
29657:20100209:112535 Starting zabbix_proxy. ZABBIX 1.6.5 (revision 7442).
29657:20100209:112535 **** Enabled features ****
29657:20100209:112535 SNMP monitoring: YES
29657:20100209:112535 WEB monitoring: YES
29657:20100209:112535 ODBC: YES
29657:20100209:112535 IPv6 support: YES
29657:20100209:112535 **************************
29659:20100209:112535 server #1 started [Configuration syncer]
29660:20100209:112535 server #2 started [Datasender]
29663:20100209:112535 server #3 started [Poller. SNMP:YES]
29670:20100209:112535 server #9 started [Trapper]
29671:20100209:112535 server #10 started [Trapper]
29672:20100209:112535 server #11 started [Trapper]
29677:20100209:112535 server #15 started [ICMP pinger]
29674:20100209:112535 server #13 started [ICMP pinger]
29673:20100209:112535 server #12 started [Trapper]
29678:20100209:112535 server #16 started [ICMP pinger]
29679:20100209:112535 server #17 started [ICMP pinger]
29666:20100209:112535 server #5 started [Poller. SNMP:YES]
29687:20100209:112535 server #18 started [Housekeeper]
29687:20100209:112535 Executing housekeeper
29667:20100209:112535 server #6 started [Poller. SNMP:YES]
29665:20100209:112535 server #4 started [Poller. SNMP:YES]
29669:20100209:112535 server #8 started [Trapper]
29690:20100209:112535 server #21 started [Poller for unreachable hosts. SNMP:YES]
29675:20100209:112535 server #14 started [ICMP pinger]
29691:20100209:112535 server #22 started [Poller for unreachable hosts. SNMP:YES]
29692:20100209:112535 server #23 started [Poller for unreachable hosts. SNMP:YES]
29657:20100209:112535 server #0 started [Heartbeat sender]
29689:20100209:112535 server #20 started [Poller for unreachable hosts. SNMP:YES]
29700:20100209:112535 server #24 started [HTTP Poller]
29668:20100209:112535 server #7 started [Poller. SNMP:YES]
29701:20100209:112535 server #25 started [Discoverer. SNMP:YES]
29688:20100209:112535 server #19 started [Poller for unreachable hosts. SNMP:YES]
29687:20100209:112536 Deleted 30191 records from history [0.731471 seconds]
29668:20100209:112537 Item [b137.organization.com:vfs.dev.write[sda,,avg1]] error: Not supported by ZABBIX agent
29668:20100209:112537 Parameter [vfs.dev.write[sda,,avg1]] is not supported by agent on host [b137.organization.com] Old status [0]
29668:20100209:112537 Item [b137.organization.com:vfs.file.time[/var/run/puppet/puppetd.stamp]] error: Not supported by ZABBIX agent
29668:20100209:112537 Parameter [vfs.file.time[/var/run/puppet/puppetd.stamp]] is not supported by agent on host [b137.organization.com] Old status [0]
29666:20100209:112538 Item [b134.organization.com
erf_counter[\Physical Disk(_Total)\Avg. Disk Read Queue Length]] error: Not supported by ZABBIX agent
29666:20100209:112538 Parameter [perf_counter[\Physical Disk(_Total)\Avg. Disk Read Queue Length]] is not supported by agent on host [b134.organization.com] Old status [0]
29665:20100209:112548 Item [b127.organization.com
roc.num[sshd]] error: Get value from agent failed: ZBX_TCP_READ() failed [Interrupted system call]
29665:20100209:112548 Host [b127.norganization.com]: first network error, wait for 15 seconds
29665:20100209:112548 Parameter [proc.num[sshd]] will be checked after 240 seconds on host [b127.organization.com]
29666:20100209:112548 Item [b127.organization.com
roc.num[httpd]] error: Get value from agent failed: ZBX_TCP_READ() failed [Interrupted system call]
29666:20100209:112548 Host [b127.organization.com]: first network error, wait for 15 seconds
29666:20100209:112548 Parameter [proc.num[httpd]] will be checked after 240 seconds on host [b127.organization.com]
29657:20100209:112535 **** Enabled features ****
29657:20100209:112535 SNMP monitoring: YES
29657:20100209:112535 WEB monitoring: YES
29657:20100209:112535 ODBC: YES
29657:20100209:112535 IPv6 support: YES
29657:20100209:112535 **************************
29659:20100209:112535 server #1 started [Configuration syncer]
29660:20100209:112535 server #2 started [Datasender]
29663:20100209:112535 server #3 started [Poller. SNMP:YES]
29670:20100209:112535 server #9 started [Trapper]
29671:20100209:112535 server #10 started [Trapper]
29672:20100209:112535 server #11 started [Trapper]
29677:20100209:112535 server #15 started [ICMP pinger]
29674:20100209:112535 server #13 started [ICMP pinger]
29673:20100209:112535 server #12 started [Trapper]
29678:20100209:112535 server #16 started [ICMP pinger]
29679:20100209:112535 server #17 started [ICMP pinger]
29666:20100209:112535 server #5 started [Poller. SNMP:YES]
29687:20100209:112535 server #18 started [Housekeeper]
29687:20100209:112535 Executing housekeeper
29667:20100209:112535 server #6 started [Poller. SNMP:YES]
29665:20100209:112535 server #4 started [Poller. SNMP:YES]
29669:20100209:112535 server #8 started [Trapper]
29690:20100209:112535 server #21 started [Poller for unreachable hosts. SNMP:YES]
29675:20100209:112535 server #14 started [ICMP pinger]
29691:20100209:112535 server #22 started [Poller for unreachable hosts. SNMP:YES]
29692:20100209:112535 server #23 started [Poller for unreachable hosts. SNMP:YES]
29657:20100209:112535 server #0 started [Heartbeat sender]
29689:20100209:112535 server #20 started [Poller for unreachable hosts. SNMP:YES]
29700:20100209:112535 server #24 started [HTTP Poller]
29668:20100209:112535 server #7 started [Poller. SNMP:YES]
29701:20100209:112535 server #25 started [Discoverer. SNMP:YES]
29688:20100209:112535 server #19 started [Poller for unreachable hosts. SNMP:YES]
29687:20100209:112536 Deleted 30191 records from history [0.731471 seconds]
29668:20100209:112537 Item [b137.organization.com:vfs.dev.write[sda,,avg1]] error: Not supported by ZABBIX agent
29668:20100209:112537 Parameter [vfs.dev.write[sda,,avg1]] is not supported by agent on host [b137.organization.com] Old status [0]
29668:20100209:112537 Item [b137.organization.com:vfs.file.time[/var/run/puppet/puppetd.stamp]] error: Not supported by ZABBIX agent
29668:20100209:112537 Parameter [vfs.file.time[/var/run/puppet/puppetd.stamp]] is not supported by agent on host [b137.organization.com] Old status [0]
29666:20100209:112538 Item [b134.organization.com
erf_counter[\Physical Disk(_Total)\Avg. Disk Read Queue Length]] error: Not supported by ZABBIX agent29666:20100209:112538 Parameter [perf_counter[\Physical Disk(_Total)\Avg. Disk Read Queue Length]] is not supported by agent on host [b134.organization.com] Old status [0]
29665:20100209:112548 Item [b127.organization.com
roc.num[sshd]] error: Get value from agent failed: ZBX_TCP_READ() failed [Interrupted system call]29665:20100209:112548 Host [b127.norganization.com]: first network error, wait for 15 seconds
29665:20100209:112548 Parameter [proc.num[sshd]] will be checked after 240 seconds on host [b127.organization.com]
29666:20100209:112548 Item [b127.organization.com
roc.num[httpd]] error: Get value from agent failed: ZBX_TCP_READ() failed [Interrupted system call]29666:20100209:112548 Host [b127.organization.com]: first network error, wait for 15 seconds
29666:20100209:112548 Parameter [proc.num[httpd]] will be checked after 240 seconds on host [b127.organization.com]
Zabbix agents on centos puts that string to log:
20456:20100209:111421 Getting list of active checks failed. Will retry after 60 seconds
Help me please how to solve my problems.

Comment