Ad Widget

Collapse

Dynamic SNMP indexes timeout

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nvitaly007
    Junior Member
    • Jan 2013
    • 11

    #1

    Dynamic SNMP indexes timeout

    Hello,

    I am having random error while monitoring remote switch with lots of interfaces, randomly I am seeing error like this for some interfaces:


    5166:20131205:211714.476 End of snmp_close_session()
    5166:20131205:211714.477 End of get_value_snmp():NETWORK_ERROR
    5166:20131205:211714.477 Item [MPT-VSS-6504E:ifOperStatus[GigabitEthernet2/4/1]] error: Cannot find index [IF-MIB::ifDescr] of the OID [IF-MIB::ifOperStatus["index","IF-MIB::ifDescr","GigabitEthernet2/4/1"]]: Timeout while connecting to "X.X.X.X161"
    5166:20131205:211714.477 End of get_value():NETWORK_ERROR

    That happening quite often and lead to:

    SNMP agent item [ifOperStatus[GigabitEthernet2/4/1]] on host [XXXXXX] failed: first network error, wait for 60 seconds

    reason that this particular switch is not in local network, and probably timeout for index is too short.

    Is there are way to tune it ? (I already increased SNMP timeout in config to 6 second, looks like it different timeout)

    Problem was in 2.0 and also exists in 2.2
  • jix
    Member
    • Feb 2011
    • 73

    #2
    !!

    Same Problem here
    SNMP agent item [ifInOctets[ether3]] on host [Swich2] failed: first network error, wait for 15 seconds

    zabbix 2.2

    and my graphs draws with gaps

    Comment

    • nvitaly007
      Junior Member
      • Jan 2013
      • 11

      #3
      Dp you also use Dynamic indexes (something like IF-MIB::ifOperStatus["index","IF-MIB::ifDescr","GigabitEthernet2/4/1"])

      what the ping time between proxy/server and device ?

      Comment

      • the_doc
        Junior Member
        • Jan 2014
        • 3

        #4
        Dynamic SNMP indexes timeout

        Same problem here with timeout on the dynamic SNMP.

        It looks like it is depending on the network the machine is connected to.
        • We have 16 machines at different locations on a fiber network of one company never missing.
        • Same type of machine, same configuration and same use, same template in a datacenter giving time out.
        • Same type of machine, same configuration and same use, same template in a different datacenter never gives a timeout.
        • Machine that is about 10 times slower on a VDSL line is never gives a timeout.
        • Machine on a coax connection gives many timeout. Can not see anything strange in the ping statistics


        Strange thing is that a normal SNMP always works. Changed a dynamic to a static OID an not missing a single data point.

        Did increase the general timeout but that did not help at all.

        In a different thread there is the suggestion that the dynamic index cache is not working. https://www.zabbix.com/forum/showthread.php?t=43901

        Could it be that in a dynamic SNMP query a complete walk is done and that a single error in the data gives a timeout ?.

        We got this idea because it looks like to be depending on the network but not on the response time in the network or CPU power on the machine. Networks with higher response time work ok, network with low response time give error, so timing is not the problem.

        The dynamic SNMP query is
        Code:
         ifInOctets["index","ifDescr","bridgeL1"]
        for some traffic on a bridge measurements

        Is it really a timeout or just a error in the data that gives a timeout ?.

        Comment

        • Linwood
          Senior Member
          • Dec 2013
          • 398

          #5
          Originally posted by the_doc
          [*]Same type of machine, same configuration and same use, same template in a datacenter giving time out.
          Is it possible you have some sort of intrusion prevention or flood prevention turned on in the failing devices, or an IPS in between, and this is evasion behavior, i.e. it's not Zabbix at all, but that it's getting so many polls and thinks it is an attack?

          Comment

          • the_doc
            Junior Member
            • Jan 2014
            • 3

            #6
            On the devices there is no flow preventions.

            The snmp is V1 but for SNMPv2 there seems to be a bug that is causing much more traffic.

            Think the same thing happens to snmpv1 query.

            On one device I asked for 6 dynamic SNMP values with many problems.
            changed 4 to a direct OID and never missed a value
            The problems on the dynamic where not there from 01:00 till 8:00
            I presume when the network of the ISP get busy the errors occur.
            Can not be certain that the dynamic snmp does cause much more traffic but it looks like it does. When there was no use of the device the internet traffic was about 3 times more with 6 dynamic snmp compared to only 2 dynamic snmp.

            Is there a way to test the snmp walk a dynamic snmp query does ?.

            Comment

            • LenR
              Senior Member
              • Sep 2009
              • 1005

              #7
              It appears that dynamic items are handled differently than the classic items. If this happens on a classic item, doesn't it go "unsupported" instead of throwing a network error? The network error seems to stop polling for all items, causing even more data loss.

              There are many devices that don't consistently respond to SNMP. Some vendors want to say that SNMP has a lower priority, but the devices fail even when not that heavily loaded.

              Comment

              • LuizMeier
                Junior Member
                • Sep 2012
                • 13

                #8
                Hello!

                I'm having the same problem here. Updated from 2.0.9 > 2.2.1 and the problem began.

                The issue [#ZBX-7690] says that the problema would be solved on 2.2.2. Now I'm updated on 2.2.2 and the issue persists.

                Any ideas? Anyne still having the same problem?

                Comment

                • wrocha
                  Junior Member
                  • Aug 2013
                  • 27

                  #9
                  Same here

                  I'm happy and disappointed at the same time!! I've been trying to solve this problem for over a month. I open two threads here (one in english and the other in portuguese) saying that could be a possible bug version and nobody answered. I thought it my be some problem at my server but it looks like it was not.

                  My problem started after the upgrade to version 2.2.1. Last week i did the upgrade to version 2.2.2 and the problem persist. Some dynamic OIDs are working and others are not.

                  Hope they will fix in next version.
                  Last edited by wrocha; 10-03-2014, 23:36.

                  Comment

                  • LuizMeier
                    Junior Member
                    • Sep 2012
                    • 13

                    #10
                    Hi, City Friend!

                    Same case here. The problem started with the Upgrade for version 2.2.1.

                    The nice thing is that there is no thread or buf report for this issue but 7690, witch is considered closed. ;/

                    My problem is just with IfInOctets and IfOutOctets.

                    The strange thing is that the problem does not affect all my SNMP hosts, just a few ones. I already checked the network and everything is fine.

                    Hope someone see this thread and help us.

                    Comment

                    • rekeds
                      Junior Member
                      • Feb 2014
                      • 21

                      #11
                      yep, pisses me off.

                      host went for a software upgrade, was down for about 5mins.

                      disabling, re-enabling items doesn't help.
                      monitored, not monitored host doesn't help.
                      adding a ping template doesn't help. (the icmp template works).

                      just says that it's a timeout connecting on :161.
                      I can snmpget the host without problems :<

                      only thing that helps is to restart zbx. fuuuh

                      Comment

                      • LuizMeier
                        Junior Member
                        • Sep 2012
                        • 13

                        #12
                        After the Upgrade for 2.2.3 it seemed to work fine now.

                        In case of configuration error it is showing me the reason. Like "I cannot find interface XYZ in ifDescr".

                        Comment

                        Working...