More than a year ago, I posted about a problem with gaps in graphs. Considering that there was no way to properly diagnose and fix that issue, I gave up and ended with a nasty workaround (a bit long to explain), by which I could get rid of the problem. For a while...
But lately I've started experiencing the same issue. This time not with a single host, but with almost all of them. My main concern right now isn't the graphs (I think that's just a symptom of an underlying problem). The problem is that data for different items stops being collected at seemingly random intervals. Sometimes it's a few minutes, sometimes it can be more than an hour.
So, after finding about the Zabbix trapper, I decided to give it a try and see what happened if I just sent data manually (from a script, using zabbix_sender), bypassing the zabbix agent and the zabbix queue.
The result of this test is something that I can't understand:
Sometimes I get this:
Warning: Incorrect answer from server []
And sometimes I also get this:
Warning: Timeout while executing operation
Querying Google with the first error message shows a single page (unfortunately in a foreign language that I cannot understand)
If I look up the second error message, I can only find a single thread (which I'm afraid is not helpful in my case).
As you can imagine, I'm quite lost at this point. What are these error messages supposed to mean? How can I diagnose (and hopefully, fix) this problem?
Thanks in advance,
Marcus Friedman
----------- Some more details -----------------
Monitored hosts are based on Debian 5, using agent 1.4.6 (from the official repository, stable branch). Zabbix server is running version 1.4.6, also on Debian 5. DB backend is MySQL 5.0.51a.
CPU usage on the server side normally averages 0.25 or less. RAM and disk have at least 50% free space.
There are no known network issues as far as I can tell.
The test script that I've mentioned (which uses zabbix_sender) is sending 3 data items every 2 minutes. Half of the times it fails sending one of them, and giving one of the error messages that I've posted above.
But lately I've started experiencing the same issue. This time not with a single host, but with almost all of them. My main concern right now isn't the graphs (I think that's just a symptom of an underlying problem). The problem is that data for different items stops being collected at seemingly random intervals. Sometimes it's a few minutes, sometimes it can be more than an hour.
So, after finding about the Zabbix trapper, I decided to give it a try and see what happened if I just sent data manually (from a script, using zabbix_sender), bypassing the zabbix agent and the zabbix queue.
The result of this test is something that I can't understand:
Sometimes I get this:
Warning: Incorrect answer from server []
And sometimes I also get this:
Warning: Timeout while executing operation
Querying Google with the first error message shows a single page (unfortunately in a foreign language that I cannot understand)
If I look up the second error message, I can only find a single thread (which I'm afraid is not helpful in my case).
As you can imagine, I'm quite lost at this point. What are these error messages supposed to mean? How can I diagnose (and hopefully, fix) this problem?
Thanks in advance,
Marcus Friedman
----------- Some more details -----------------
Monitored hosts are based on Debian 5, using agent 1.4.6 (from the official repository, stable branch). Zabbix server is running version 1.4.6, also on Debian 5. DB backend is MySQL 5.0.51a.
CPU usage on the server side normally averages 0.25 or less. RAM and disk have at least 50% free space.
There are no known network issues as far as I can tell.
The test script that I've mentioned (which uses zabbix_sender) is sending 3 data items every 2 minutes. Half of the times it fails sending one of them, and giving one of the error messages that I've posted above.
)
Comment