PDA

View Full Version : get_value_agent is not called for all hosts in 1.1.1 ?


limo
06-09-2006, 00:23
Hello all,

I am experiencing that zabbix 1.1.1 does not get values for all host, only for some of them. I have still old data in some servers, even if status of server and item is "active". I cannot find some glue why some servers work and some not. I read in this forum that there are more people like me. Does somebody has some hint ? It is very bad, our monitoring is entirely broken :( I will downgrade to 1.1 probably just now :(

When I do traffic dump, I do not see any request to agent ports for broken servers, but I can see traffic to good servers. Simple items, icmp, snmp works.

Any suggestions??

Alexei
06-09-2006, 07:58
I do not remember any changes in 1.1.1 that might potentially affect this functionality. Are you saying that some hosts are not monitored at all?

limo
06-09-2006, 09:02
I see only host status 0, icmp replies ok and agent values are old. When I look into config, I see that all servers are active and monitored. But to some of them, zabbix_server never sends any communication to port 10050 (I tried tcpdump).
Even items are enabled and active.

It probably does not seem to be problem with zabbix 1.1.1, I downgraded to zabbix 1.1 and everything is same :( Maybe we have some problem in db.. ? But why frontend says everything is ok but zabbix_server does not checke that hosts?
Thank you for any advice..

Alexei
06-09-2006, 09:06
Anything in ZABBIX server log file? Perhaps you changed system time to past?

limo
06-09-2006, 09:20
This is in server.log. I tried to increase debug level but no success, I did not any relevent information about missing hosts and get_value_agent. Only checks to working servers were found.
Please what it means: started [Poller. SNMP:ON] ? It means poller for SNMP, not for agent checks? So am I missing some process?


019877:20060906:092957 Starting zabbix_server. ZABBIX 1.1.1.
019891:20060906:092959 server #1 started [Alerter]
019892:20060906:092959 server #2 started [Timer]
019893:20060906:092959 server #3 started [ICMP pinger]
019894:20060906:092959 server #4 started [Poller for unreachable hosts. SNMP:ON]
019894:20060906:092959 Cannot connect to [gw-limonet] [Connection refused]
019894:20060906:092959 Host [gw-limonet] will be checked after 60 seconds
019900:20060906:093000 server #7 started [Poller. SNMP:ON]
019895:20060906:093000 server #5 started [Poller. SNMP:ON]
019904:20060906:093000 server #11 started [Trapper]
019905:20060906:093000 server #12 started [Trapper]
019906:20060906:093000 server #13 started [Trapper]
019907:20060906:093000 server #14 started [Trapper]
019896:20060906:093000 server #6 started [Poller. SNMP:ON]
019908:20060906:093000 server #15 started [Trapper]
019909:20060906:093000 server #16 started [Trapper]
019877:20060906:093000 server #0 started [Housekeeper]
019877:20060906:093000 ZABBIX server is up.
019901:20060906:093000 server #8 started [Poller. SNMP:ON]
019903:20060906:093000 server #10 started [Poller. SNMP:ON]
019902:20060906:093000 server #9 started [Poller. SNMP:ON]

limo
09-09-2006, 20:06
Hi all,

I just solved my problem. Problem was that in zabbix_server.conf, there was option StartSuckers which changed to StartPollers. And only 5 pollers are started by default and it was too little to check all servers.

I know it is probably my problem, because I did not read documentation.. But..

My BIG SUGGESTION is , please , Alex, make logging more verbose.. When I changed loglevel to 2, I see only infos about processes start and unreachable hosts. When I increase level, I see many sql statements, but did not see real problem there..

I think it should be good to write at least config parameters at start to logfile (StartPollers, StartTrappers, ..) . Next, there should be some logic to write that tehere is too small number of pollers or trappers (when all are busy). It is enaught to write "warning - all pollers threads busy".

It was very ard to find source of my problem.. And took too much time.

Thanx,
Lukas