Ad Widget

Collapse

excessive snmp polling and anomalies

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • xxiii
    Junior Member
    • Jun 2013
    • 28

    #1

    excessive snmp polling and anomalies

    zabbix version 2.4.5:

    While investigating why the below is happening:

    2015-08-25 09:05:34 1440515134 87096
    2015-08-25 08:55:35 1440514535 87560
    2015-08-25 08:35:35 1440513335 8164424457647408
    2015-08-25 08:25:35 1440512735 82664
    2015-08-25 08:15:35 1440512135 81096

    2015-08-21 21:45:34 1440215134 176608
    2015-08-21 21:35:35 1440214535 169496
    2015-08-21 21:15:35 1440213335 8163386583496584
    2015-08-21 21:05:34 1440212734 156304
    2015-08-21 20:55:34 1440212134 152608


    I set up a packet capture [Please ignore the rest of the below, but see the following posts.]

    and discovered that zabbix is polling the relevant value every 15 seconds even though its interval is 600 (10 minutes). Its only recording it every 10 minutes (occasionally incorrectly, as shown above, its a speed delta).

    The host isn't duplicated in zabbix, and the host isn't returning values that should result in the above anomaly.

    I've looked in the hosts, interfaces, and items table to verify; the quickest values this host has are 60 seconds.

    It looks like from the packet capture there are two specific interfaces whos' ifHCIn and OutOctets are being captured every 15 seconds. everything else seems to be going on the correct schedule. Did zabbix switch to a 15 second retry at some point, and never turn it back off? and why is it getting the math wrong occasionally?

    (The log has some 15 second retry messages here and there, but not continuously, and not generally for these OIDs).
    Last edited by xxiii; 26-08-2015, 00:10. Reason: false alarm on the timing, but the bad data issue remains, change title (was excessive snmp polling and anomalies)
  • xxiii
    Junior Member
    • Jun 2013
    • 28

    #2
    snmp anomaly (timing issue resolved)

    I note also that whenever it has one of these anomalous values, it seems to lose the following value. In the posted examples, its missing a value for 21:25 and for 08:45.

    Sorry, false alarm on the 15 second polling, I had a separate application polling the value to see if the device was returning bad data before I switched to packet capture; I'd set it up several weeks ago and forgot that it was still running.

    I see in the response packet that corresponds to the 08:45 value, it contains 47 variable bindings, and the one right after the value that should be used is a No Such Instance. I wonder if this is tripping it up somehow.

    Before we get to the value we care about there are several other No Such Instance and No Such Object responses. Is Zabbix stopping at the first one of these it hits, not taking into account this packet contains many responses? But then it generates the incorrect value, did it try to subtract the prior value from 0, instead of from 101353846429 when calculating the delta?

    (A bunch of other values, then the one we care about (for this issue))

    IF-MIB::ifHCOutOctets.83886080 (1.3.6.1.2.1.31.1.1.1.10.83886080): 101353846429

    (the next value)

    SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.3 (1.3.6.1.4.1.9.9.109.1.1.1.1.7.3): noSuchInstance

    (and a bunch more values)

    I'm going to try disabling snmp bulk requests for this host and see if the issue goes away.

    Comment

    • xxiii
      Junior Member
      • Jun 2013
      • 28

      #3
      Looking through the code, zabbix_server/poller/checks_snmp.c, it appears to stop processing the entire response at the first NOSUCHOBJECT or NOSUCHINSTANCE encountered (the "break" at the end below causes the for loop to terminate):

      (disclaimer, I haven't extensively analyzed this or surrounding code, but this appears to be a problem):

      Code:
                      /* process response */
                      for (num_vars = 0, var = response->variables; NULL != var; num_vars++, var = var->next_variable)
                      {
      [...]
                              else if (SNMP_NOSUCHOBJECT != var->type && SNMP_NOSUCHINSTANCE != var->type)
      [...]
                              else
                              {
                                      /* an exception value, so stop */
      
                                      SET_MSG_RESULT(result, zbx_get_snmp_type_error(var->type));
                                      ret = NOTSUPPORTED;
                                      running = 0;
      
      [I]/** [B](my comment)[/B] TODO must process the rest of the
      response, which may (and probably) still contains valid data **/[/I]
      
                                      break;
                              }
      There are a couple of other "break"s earlier in the loop that look suspicious in this context as well.

      Also, still not sure where its generating the bad value from. My guess is that whatever calculates deltas thinks it got a good value, but its 0 (or some uninitialized value) because this for loop didn't actually complete, and didn't return an error for THIS oid.
      Last edited by xxiii; 26-08-2015, 00:22. Reason: add guess to last sentence

      Comment

      Working...