Announcement

Collapse
No announcement yet.

SNMP stops working ... partially

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    Hi there,

    I upgraded on of the routers to the latest firmware, but no success. After the reboot it complains about authentication errors.

    My next step is to trace the traffic between zabbix-server and the upgraded router, something must happen after the restart. I'am very positive that it has something to do with a password hask or something that is generated only once, but need to be regenerated after restart.

    Will keep you posted.

    --Michael

    Comment


      #17
      Hi there,

      ok here some infos about my packet trace. I cannot comment it, maybe it's a bug, I'am kinda lost.

      before router restart:
      ------------
      SNMP get-request (bad UDP checksum, no variable-bindings, no username)
      SNMP report 1.3.6.1.6.3.15.1.1.4.0

      which means:
      usmStatsUnknownEngineIDs
      This is set when the snmpEngineID specified in the request message does not match with that of the agent.
      This error is reported by the agent with its first varbind containing the OID ".1.3.6.1.6.3.15.1.1.4.0" and the value is a Counter giving the number of packets that have been dropped because of this error.
      ------------
      SNMP get-request 1.3.6.1.2.1.2.2.1.10.10001 (authenticated, username)
      SNMP get-response 1.3.6.1.2.1.2.2.1.10.10001

      and then again from above ...
      --------------------------------------------------------------

      After the router restarts the log changes:
      ------------
      13x SNMP get-request (bad UDP checksum, no variable-bindings, no username)
      SNMP report 1.3.6.1.6.3.15.1.1.4.0
      SNMP get-request 1.3.6.1.2.1.2.2.1.10.10001 (authenticated, username)
      SNMP report 1.3.6.1.6.3.15.1.1.2.0

      which means:
      usmStatsNotInTimeWindows
      This is set when the engineTime specified is not within the timeWindow of agent. The engineTime is considered not within the timeWindow if any of the following is true.
      If the agent's snmpEngineBoots value is equal to 2147483647.
      If the request's snmpEngineBoots value differs from that of the agent.
      If the difference between the SNMP request's snmpEngineTime and that of the agent is greater than 150.
      This error is reported by the agent with its first varbind containing the OID ".1.3.6.1.6.3.15.1.1.2.0" and the value is a Counter giving the number of packets that have been dropped because of this error.

      and this happens forever until the zabbix-server gets restarted.
      ------------

      EDIT: I figured that the msgAuthoritativeEngineTime of the Request is 44158 and for the report it is 431 when it's not working. For the time the SNMP check works they are kinda the same.

      Whats next? Alexei, any word on this?

      Best Regards

      --Michael
      Last edited by MichaelM; 17-03-2009, 16:19.

      Comment


        #18
        Hi there,

        do someone know how I can create a ticket for this or something?

        This is a show stopper for us, because we need to monitor funkwerk routers and it really looks like a bug.

        Alexei?

        Help highly appreciated.

        --Michael

        Comment


          #19
          Reported as ZBX-832

          Hi there,

          Zabbix 1.6.3 did not fixed this issue, so I created a bug report, hopefully this brings some light in the dark.

          --Michael

          Comment


            #20
            Thank you, MichaelM. You made good investigation.
            I have a similar problems with some CISCO switches. And only regular zabbix restart gets SNMPv3 items back to work from the "Not supported" state.
            Unfortunately zabbix doesn't shows real reasons of the snmp request failure.

            Comment


              #21
              Zabbix does no cache anything, so it is very strange that SNMP recovers after restart of the device. However it looks (?) like a problem on Zabbix side. We need some time to re-produce this. We have no SNMPv3 devices ready for testing at the moment.
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment


                #22
                Hello Alexei,

                if you need, I can provide you with access to a SNMPv3 device if you like, please contact me in private, I'll send you the data than.

                --Michael

                Comment


                  #23
                  Hi,

                  same problem in 1.6.4 ... still willing to support you with remote access to a SNMPv3 device.

                  --Michael

                  Comment


                    #24
                    please see https://support.zabbix.com/browse/ZBX-2152

                    at least in one case it has been confirmed that misconfigured snmp devices were to blame
                    Zabbix 3.0 Network Monitoring book
                    Zabbix tips and inspiration (blog)

                    Comment


                      #25
                      SNMP stops working ... 1.8.2

                      Hi there,

                      I'am still in the exact same spot. Zabbix 1.8.2 on Ubuntu 10.04 is giving me the same "Timeout while connecting to " errors for Funkwerk routers monitored via SNMPv3.

                      I checked with snmpwalk and all routers have their unique EngineID (.1.3.6.1.6.3.10.2.1.1.0), so this shouldn't be an issue here?

                      This is really somewhat frustrating, because of this I'am not able to roll out Zabbix at the customers site.

                      Help highly appreciated.

                      --Michael

                      Comment


                        #26
                        SNMPv3 stops working ... after a while

                        Hi,

                        that's what tcpdump is capturing when it's working and when it stopped:

                        WORKING:

                        4128 30540.471396 SNMP report SNMP-USER-BASED-SM-MIB::usmStatsUnknownEngineIDs.0
                        4129 30540.471746 SNMP get-request IF-MIB::ifInOctets.10001
                        4130 30540.496987 SNMP get-response IF-MIB::ifInOctets.10001
                        4131 30542.713568 SNMP get-request
                        4132 30542.738017 SNMP report SNMP-USER-BASED-SM-MIB::usmStatsUnknownEngineIDs.0
                        4133 30542.738346 SNMP get-request IF-MIB::ifOutOctets.10001
                        4134 30542.763166 SNMP get-response IF-MIB::ifOutOctets.10001
                        4135 30600.536825 SNMP get-request
                        4136 30600.561639 SNMP report SNMP-USER-BASED-SM-MIB::usmStatsUnknownEngineIDs.0

                        ------------------------------------------------------------------

                        AND IT STOPPED:

                        4137 30600.561976 SNMP get-request IF-MIB::ifInOctets.10001
                        4138 30600.587664 SNMP report SNMP-USER-BASED-SM-MIB::usmStatsNotInTimeWindows.0
                        4139 30601.569005 SNMP get-request IF-MIB::ifInOctets.10001
                        4140 30601.594592 SNMP report SNMP-USER-BASED-SM-MIB::usmStatsNotInTimeWindows.0
                        4141 30601.841184 SNMP get-request


                        I don't know if this is any help.

                        Best regards.

                        --Michael

                        Comment


                          #27
                          related??

                          Hi:

                          I'm facing a similar situation:
                          After a lot of time working without problems, some interfaces from a router (Cisco 7200 SNMP v2) went down (unplug) and went up again (plug).

                          After this, zabbix is not getting some snmp items from router, but the strange thing is that zabbix is getting the "in traffic" on same interfaces, but not the "out traffic" of the same interfaces.. and on other interfaces is getting the out but not the in traffic...

                          In zabbix web interface the items are marked as "not supported" but, from zabbix machine I can do a snmpget of that item and get those unsupported values. Trying from web interface to disable/enable again, but no success.

                          Looking in the logs, I found nothing.. So.. zabbix_server restart fix the problem.. but this is not a nice situation, because I can't know when zabbix is doing this kind of strange things in order to restart it...

                          Zabbix 1.8.2. compiled against net-snmp 5.3.2.2 (x64)
                          Cisco 72000 Software (C7200-IK8S-M), Version 12.2(46a)

                          Any advice on this??

                          Comment


                            #28
                            Originally posted by richlv View Post
                            please see https://support.zabbix.com/browse/ZBX-2152

                            at least in one case it has been confirmed that misconfigured snmp devices were to blame
                            After deep investigations last days I can say for sure that indeed current discussion is all about https://support.zabbix.com/browse/ZBX-2152

                            There no other reasons.
                            Make sure that you don't have duplicated EngineID across all monitored SNMPv3 devices.

                            Comment

                            Working...
                            X