Ad Widget

Collapse

Data stops being collected at random intervals

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • marcusfriedman
    Junior Member
    • Apr 2009
    • 12

    #1

    Data stops being collected at random intervals

    More than a year ago, I posted about a problem with gaps in graphs. Considering that there was no way to properly diagnose and fix that issue, I gave up and ended with a nasty workaround (a bit long to explain), by which I could get rid of the problem. For a while...

    But lately I've started experiencing the same issue. This time not with a single host, but with almost all of them. My main concern right now isn't the graphs (I think that's just a symptom of an underlying problem). The problem is that data for different items stops being collected at seemingly random intervals. Sometimes it's a few minutes, sometimes it can be more than an hour.

    So, after finding about the Zabbix trapper, I decided to give it a try and see what happened if I just sent data manually (from a script, using zabbix_sender), bypassing the zabbix agent and the zabbix queue.

    The result of this test is something that I can't understand:

    Sometimes I get this:

    Warning: Incorrect answer from server []

    And sometimes I also get this:

    Warning: Timeout while executing operation

    Querying Google with the first error message shows a single page (unfortunately in a foreign language that I cannot understand)

    If I look up the second error message, I can only find a single thread (which I'm afraid is not helpful in my case).

    As you can imagine, I'm quite lost at this point. What are these error messages supposed to mean? How can I diagnose (and hopefully, fix) this problem?


    Thanks in advance,
    Marcus Friedman

    ----------- Some more details -----------------

    Monitored hosts are based on Debian 5, using agent 1.4.6 (from the official repository, stable branch). Zabbix server is running version 1.4.6, also on Debian 5. DB backend is MySQL 5.0.51a.

    CPU usage on the server side normally averages 0.25 or less. RAM and disk have at least 50% free space.

    There are no known network issues as far as I can tell.

    The test script that I've mentioned (which uses zabbix_sender) is sending 3 data items every 2 minutes. Half of the times it fails sending one of them, and giving one of the error messages that I've posted above.
    Last edited by marcusfriedman; 05-06-2010, 01:14.
  • marcusfriedman
    Junior Member
    • Apr 2009
    • 12

    #2
    This is a typical example of the output that I get when running my test script. This script collects cpu, mb and hd temperatures, writes them to a file, and then calls zabbix_sender with the -i option.

    Code:
    Fri Jun  4 21:18:51 ART 2010
    Collecting...
    50
    44
    34
    Sending...
    zabbix_sender [15364]: DEBUG: Send to: 'xx.xx.xx.xx:10051' As: 'test' Key: 'temp.cpu' Value: '50'
    zabbix_sender [15364]: Warning: Timeout while executing operation
    zabbix_sender [15423]: DEBUG: Send to: 'xx.xx.xx.xx:10051' As: 'test' Key: 'temp.mb' Value: '44'
    zabbix_sender [15423]: DEBUG: Send data: '<req><host>xxx</host><key>dGVtcC5tYg==</key><data>NDQ=</data></req>'
    zabbix_sender [15424]: DEBUG: Send to: 'xx.xx.xx.xx:10051' As: 'test' Key: 'temp.HD' Value: '34'
    zabbix_sender [15424]: DEBUG: Send data: '<req><host>xxx</host><key>dGVtcC5IRA==</key><data>MzQ=</data></req>'
    zabbix_sender [15424]: Warning: Incorrect answer from server []
    sent: 1; failed: 2; total: 3
    From time to time I also get this error message:

    Send value error: ZBX_TCP_READ() failed [Connection reset by peer]

    although this last one seems to happen less frequently.

    BTW, is it actually necessary for zabbix_sender to spawn a new process for every item sent, even when all of them are being sent to the same server and port? I guess that could be necessary under some circumstances, but perhaps there should be an option to override that behaviour.
    Last edited by marcusfriedman; 05-06-2010, 02:11.

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #3
      zabbix 1.4.6 is oooooold.
      anyway maybe increasing trapper count helps somewhat.
      personally, i'd suggest waiting for 1.8.3 and upgrading, as it should also provide notably better performance.
      Zabbix 3.0 Network Monitoring book

      Comment

      • marcusfriedman
        Junior Member
        • Apr 2009
        • 12

        #4
        Hi rich, and thanks for your reply.

        I understand that 1.4.x releases are old, and I will try to perform some tests with version 1.8.2, which I understand is what the developers consider now as "stable".

        I'm also interested in giving 1.8.3 a try when it becomes available, specially since you've mentioned that it will provide a substantial performance improvement.


        Best regards,
        Marcus

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          to clarify : 1.8 provides performance improvements over 1.4, and 1.8.3 provides further performance gains.

          why my suggestion about 1.8.3 - it should be out real soon now (tm) and it has tons of bugfixes and great new features (you can read about 1.8.3 alone at http://www.zabbix.com/documentation/...at_s_new_1.8.3 - and there are also pages for 1.8, 1.8.1 and 1.8.2 )
          Zabbix 3.0 Network Monitoring book

          Comment

          Working...