Ad Widget

Collapse

Update intervals failing for one host only

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • marcusfriedman
    Junior Member
    • Apr 2009
    • 12

    #1

    Update intervals failing for one host only

    Hi, I'm having the following problem.

    One of the hosts that I'm monitoring sends data at seemingly random intervals, and not when it's supposed to be sending it. This particular host has 40 items defined, most of them with a 30 seconds interval, and a few with 15 secs.

    When I go to the Latest data view and query the values, I can see that they come every 2 or more minutes. If I grep the zabbix log in the host system, I can see that the times when items were queried and sent match what the Zabbix console shows (so no network issues there).

    I've tried several things, and none of them seems to work:
    - Changing check item types to "Zabbix agent (active)" and back to "Zabbix agent"
    - Increasing the number of pre-forked instances of zabbix_agentd
    - Increasing the Timeout in the Zabbix agent
    - Disabling and re-enabling all the items

    This problem leads to lots of gaps in graphs, since there isn't enough data to plot them properly.

    If I go to the Queue monitor, I can see that I have several items in the 5 minutes column, and some in the "More than 5 minutes". And the Queue details always shows a lot of items belonging to this specific host, with dates slightly in the past (between a few seconds and a few minutes from the current system time).

    I'm not sure if this is a performance issue with the Zabbix server. CPU load is around 0.01, and there's at least 50% of free RAM and disk space. Since this problem happens with only one host amongst ~40 being monitored, I guess that there's nothing wrong with the database server either.

    What can I do in order to diagnose and fix this issue?


    Thanks in advance,
    Marcus

    P.S: the server is a Debian 5.0 system running Zabbix 1.4.6 (installed from the Debian's repositories). The host is also running Debian 5 with Zabbix agent 1.4.6.
    Last edited by marcusfriedman; 03-11-2009, 20:10.
  • marcusfriedman
    Junior Member
    • Apr 2009
    • 12

    #2
    If I look at the Zabbix agent log (with DebugLevel = 4), I can see that is processing the items a lot slower than it would be needed.

    For example, given 30 items with a 30" update interval, there should be 60 item queries performed each minute (each of the 30 items queried twice per minute). So in this case the Zabbix agent should be processing roughly 1 query per second.

    However, the log shows that the agent queries between 26 and 36 items per minute, which doesn't seem fast enough.

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #3
      if you try to get data with zabbix_get (from the zabbix server), is there a noticeable delay or some other problem ?
      any errors logged in the server logfile regarding that host ?
      Zabbix 3.0 Network Monitoring book

      Comment

      • marcusfriedman
        Junior Member
        • Apr 2009
        • 12

        #4
        If I try to fetch the data manually with zabbix_get, I get a timeout every time. For example:

        Code:
        zabbix_get -s xx.xx.xx.xx -k sensors[8]
        zabbix_get [7009]: Timeout while executing operation.
        Maybe I'm not using the proper syntax for zabbix_get? I don't know how it handles timeouts, because it drops the connections with an error message before 5 seconds and I have Timeout=15 in my zabbix_server.conf.

        However, and this is quite interesting, if I try to query the remote agent through telnet, I can get the values without a problem. For example:

        Code:
        telnet xx.xx.xx.xx 10050
        Trying xx.xx.xx.xx...
        Connected to xx.xx.xx.xx.
        Escape character is '^]'.
        sensors[8]
        ZBXD14.8Connection closed by foreign host.
        Another thing that I noticed is that while using zabbix_get, the query shows up in the agent's log up about 7-8" after the moment that I issue the command from the server.

        Code:
        20180:20091103:201108 Requested [sensors[8]]
        20180:20091103:201108 Before
        20180:20091103:201108 Run remote command [/usr/local/sbin/sensors 8] Result [5] [14.8]...
        20180:20091103:201108 Sending back [14.8]
        20182:20091103:201110 XML before sending [...]
        20182:20091103:201111 OK
        It seems that the answer never gets back to zabbix_get because it drops the connection before giving the zabbix agent a chance.

        Same thing happens when using telnet. That is, the query shows up several second later on the remote host. The only difference is that with telnet the connection doesn't get dropped, and I do get an answer.
        Last edited by marcusfriedman; 04-11-2009, 01:25.

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          just to be sure, running telnet and zabbix_get from the same machine ?
          check syslog on the monitored messages - any entries about smi bus delays ?
          if you do some 10 queries with telnet and 10 with zabbix_get, do telnet ones always return immediately and _get ones timeout ?
          which version of zabbix agent ?
          Zabbix 3.0 Network Monitoring book

          Comment

          Working...