PDA

View Full Version : Update intervals failing for one host only


marcusfriedman
03-11-2009, 17:42
Hi, I'm having the following problem.

One of the hosts that I'm monitoring sends data at seemingly random intervals, and not when it's supposed to be sending it. This particular host has 40 items defined, most of them with a 30 seconds interval, and a few with 15 secs.

When I go to the Latest data view and query the values, I can see that they come every 2 or more minutes. If I grep the zabbix log in the host system, I can see that the times when items were queried and sent match what the Zabbix console shows (so no network issues there).

I've tried several things, and none of them seems to work:
- Changing check item types to "Zabbix agent (active)" and back to "Zabbix agent"
- Increasing the number of pre-forked instances of zabbix_agentd
- Increasing the Timeout in the Zabbix agent
- Disabling and re-enabling all the items

This problem leads to lots of gaps in graphs, since there isn't enough data to plot them properly.

If I go to the Queue monitor, I can see that I have several items in the 5 minutes column, and some in the "More than 5 minutes". And the Queue details always shows a lot of items belonging to this specific host, with dates slightly in the past (between a few seconds and a few minutes from the current system time).

I'm not sure if this is a performance issue with the Zabbix server. CPU load is around 0.01, and there's at least 50% of free RAM and disk space. Since this problem happens with only one host amongst ~40 being monitored, I guess that there's nothing wrong with the database server either.

What can I do in order to diagnose and fix this issue?


Thanks in advance,
Marcus

P.S: the server is a Debian 5.0 system running Zabbix 1.4.6 (installed from the Debian's repositories). The host is also running Debian 5 with Zabbix agent 1.4.6.

marcusfriedman
03-11-2009, 18:00
If I look at the Zabbix agent log (with DebugLevel = 4), I can see that is processing the items a lot slower than it would be needed.

For example, given 30 items with a 30" update interval, there should be 60 item queries performed each minute (each of the 30 items queried twice per minute). So in this case the Zabbix agent should be processing roughly 1 query per second.

However, the log shows that the agent queries between 26 and 36 items per minute, which doesn't seem fast enough.

richlv
03-11-2009, 21:19
if you try to get data with zabbix_get (from the zabbix server), is there a noticeable delay or some other problem ?
any errors logged in the server logfile regarding that host ?

marcusfriedman
04-11-2009, 00:22
If I try to fetch the data manually with zabbix_get, I get a timeout every time. For example:

zabbix_get -s xx.xx.xx.xx -k sensors[8]
zabbix_get [7009]: Timeout while executing operation.

Maybe I'm not using the proper syntax for zabbix_get? I don't know how it handles timeouts, because it drops the connections with an error message before 5 seconds and I have Timeout=15 in my zabbix_server.conf.

However, and this is quite interesting, if I try to query the remote agent through telnet, I can get the values without a problem. For example:

telnet xx.xx.xx.xx 10050
Trying xx.xx.xx.xx...
Connected to xx.xx.xx.xx.
Escape character is '^]'.
sensors[8]
ZBXD14.8Connection closed by foreign host.

Another thing that I noticed is that while using zabbix_get, the query shows up in the agent's log up about 7-8" after the moment that I issue the command from the server.

20180:20091103:201108 Requested [sensors[8]]
20180:20091103:201108 Before
20180:20091103:201108 Run remote command [/usr/local/sbin/sensors 8] Result [5] [14.8]...
20180:20091103:201108 Sending back [14.8]
20182:20091103:201110 XML before sending [...]
20182:20091103:201111 OK

It seems that the answer never gets back to zabbix_get because it drops the connection before giving the zabbix agent a chance.

Same thing happens when using telnet. That is, the query shows up several second later on the remote host. The only difference is that with telnet the connection doesn't get dropped, and I do get an answer.

richlv
18-11-2009, 13:32
just to be sure, running telnet and zabbix_get from the same machine ?
check syslog on the monitored messages - any entries about smi bus delays ?
if you do some 10 queries with telnet and 10 with zabbix_get, do telnet ones always return immediately and _get ones timeout ?
which version of zabbix agent ?