Ad Widget

Collapse

Random SNMP agents hanging

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • smo
    Junior Member
    • Oct 2007
    • 17

    #1

    Random SNMP agents hanging

    Hi,

    It seems like some of my SNMPv2 agent items are hanging. Some go thru - some hang (the hanging items are random). I can see them in the queue (more than 10 minutes).

    I'm running the very latest Zabbix snapshot (same problem with 1.6 release and the snapshot though). I set the loglevel to debug, but unfortunately it didn't yield any useful information. No timeouts or anything useful in the log. All SNMP queries work fine when done using snmpwalk from the Zabbix host though.
  • smo
    Junior Member
    • Oct 2007
    • 17

    #2
    Is there any way to get better logs out of pollers? Currently if I jack up the Zabbix server log level to debug, I mostly get debug output of the SQL statements being run.

    Comment

    • smo
      Junior Member
      • Oct 2007
      • 17

      #3
      Hi again - I investigated this problem further. I can't find anything related to the "hanging agents" in Zabbix server log. In fact, I think the "in queue" view is based on the "items" tables "nextcheck" value and they're not getting picked up properly for polling.

      I believe this the query that retrieves the items to check:

      Code:
      select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex,h.dns,i.params,i.trends,h.useipmi,h.ipmi_port,h.ipmi_authtype,h.ipmi_privilege,h.ipmi_username,h.ipmi_password,i.ipmi_sensor from hosts h, items i where i.nextcheck<=1225136336 and i.status in (0,3) and i.type not in (2,7,9,12) and h.status=0 and h.disable_until<=1225136331 and h.errors_from=0 and h.hostid=i.hostid and (h.proxy_hostid=0 or i.type in (5)) and mod(i.itemid,10)=1 and i.key_ not in ('status','icmpping','icmppingsec','zabbix[log]') and h.hostid between 000000000000000 and 099999999999999 order by i.nextcheck
      The interesting part is mod(i.itemid,10).. I believe I had now 10 pollers started and this apparently load balances the items to be checked among different pollers. However, there are some values I don't see at all, such as "2" or "8":

      Code:
      ]# for i in `seq 10` ; do echo -n "mod $i: " ; cat /var/log/zabbix/zabbix_server.log | grep "mod.i.itemid,10)=$i" |wc -l ; done
      mod 1: 14
      mod 2: 0
      mod 3: 0
      mod 4: 0
      mod 5: 0
      mod 6: 14
      mod 7: 0
      mod 8: 0
      mod 9: 15
      mod 10: 0
      Last edited by smo; 27-10-2008, 21:36.

      Comment

      • smo
        Junior Member
        • Oct 2007
        • 17

        #4
        I wiped clean my database (was originally upgraded from 1.4) and started fresh with 1.6. Didn't help any of my problems still though. I still got SNMP things in queue for hours (time is now 9:33 and top item in my queue is a SNMP poll from 8:39). SNMP collected data shows huge gaps (sometimes hours missing). I can get the same SNMP values on the Zabbix server using snmpwalk in a fraction of a second.

        Comment

        • kehall
          Member
          • Sep 2008
          • 30

          #5
          I too am seeing SNMP (and occasionally other agents) queuing up, but eventually processed (but then too late to be acted upon and data is missing from graphs).

          It's quite frustrating, despite how good Zabbix has been so far, as a reliable monitoring system is pretty crucial, and this issue, with the lack of more conditions in actions is making it hard work! The occasional spurious 'Zabbix database is down' message is odd too! Anyway not to take this thread off topic, I hope the underlying cause of this particular issue gets resolved

          K

          Comment

          • smo
            Junior Member
            • Oct 2007
            • 17

            #6
            I'm getting the random database down messages too (and sometimes triggers trigger even though I'm 100% positive there's nothing down etc), though I believe it's not a Zabbix issue. You're probably getting "error 32: broken pipe" messages from MySQL when that happens (very strange since I'm using the same MySQL RPM on loads of hosts and the errors only happen on certain hosts running any kind of MySQL app).

            Comment

            • smo
              Junior Member
              • Oct 2007
              • 17

              #7
              Updated to 1.6.1. The issues still exist - neither normal nor SNMP agents aren't getting properly picked up from the queue or they hang while processing (I'm guessing the first option though).

              Comment

              Working...