Ad Widget

Collapse

SNMP v3 Issues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • steveroebuck
    Junior Member
    • Jan 2018
    • 19

    #1

    SNMP v3 Issues

    Unable to poll SNMP v3 devices with more than 1 poller being enabled on either server or proxy. With a single poller we are able to get some results back but the poller process is being overworked. (Reported in Zabbix)

    We also get gappy data from our SNMP v3 devices, but we do not experience this on Agent based host data

    Reducing the pollers to 1 has been suggested throughout the community but is not sustainable for the size of our estate.

    Scaling the proxies up to 2 causes Authentication failures on the server and massive loss of incoming data streams.

    I have TCPdumped the outgoing traffic from the server when 1 and 5 pollers and with a single poller the SNMP information is sent out correctly, but with 5 pollers the SNMP information is sent incorrectly (I have attached both traces).

    The switches are working fine I can SNMPwalk from the service/proxy directly and retrieve data without fail.

    We use Authpriv - AES/SHA

    Logfile below @ debug level 1

    20769:20180130:085653.978 SNMP agent item "ifAdminStatus[Ten-GigabitEthernet1/0/1]" on host "" failed: first network error, wait for 15 seconds
    20772:20180130:085708.628 resuming SNMP agent checks on host "": connection restored
    20764:20180130:085738.655 item ":ifAdminStatus[Ten-GigabitEthernet1/0/1]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20764:20180130:085738.656 item ":ifAdminStatus[Ten-GigabitEthernet2/0/9]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20769:20180130:085812.051 SNMP agent item "ifOutOctets[Ten-GigabitEthernet2/0/29]" on host "" failed: first network error, wait for 15 seconds
    20773:20180130:085827.659 resuming SNMP agent checks on host "": connection restored
    20762:20180130:085857.889 item ":ifAdminStatus[Bridge-Aggregation1]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20770:20180130:085927.970 SNMP agent item "ifOperStatus[LoopBack0]" on host " -PP" failed: first network error, wait for 15 seconds
    20771:20180130:085942.765 resuming SNMP agent checks on host " -PP": connection restored
    20764:20180130:085942.979 item " -PP:ifNumber" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20763:20180130:090012.982 item " -PP:ifOperStatus[LoopBack0]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20770:20180130:090043.500 SNMP agent item "ifOutOctets[Ten-GigabitEthernet2/0/22]" on host " -PP" failed: first network error, wait for 15 seconds
    20773:20180130:090058.763 resuming SNMP agent checks on host " -PP": connection restored
    20764:20180130:090129.297 item " -PP:ifOperStatus[FortyGigE2/0/50]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20764:20180130:090129.297 item " -PP:ifOutOctets[Ten-GigabitEthernet2/0/22]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20770:20180130:090159.233 SNMP agent item "ifOutOctets[Ten-GigabitEthernet1/0/21]" on host " -PP" failed: first network error, wait for 15 seconds
    20772:20180130:090214.830 resuming SNMP agent checks on host " -PP": connection restored
    20762:20180130:090215.442 item " -PP:ifNumber" became supported
    20762:20180130:090245.464 item " -PP:ifOperStatus[Bridge-Aggregation12]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20769:20180130:090319.641 SNMP agent item "ifOutErrors[Ten-GigabitEthernet2/0/45]" on host " -PP" failed: first network error, wait for 15 seconds
    20774:20180130:090334.903 resuming SNMP agent checks on host " -PP": connection restored
    20762:20180130:090335.544 item " -PP:ifNumber" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20761:20180130:090405.550 item " -PP:ifInOctets[Ten-GigabitEthernet2/0/13]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20770:20180130:090435.158 SNMP agent item "ifOperStatus[Ten-GigabitEthernet2/0/42]" on host " -PP" failed: first network error, wait for 15 seconds
    20773:20180130:090450.963 resuming SNMP agent checks on host " -PP": connection restored
    20763:20180130:090451.672 item " -PP:ifNumber" became supported
    20761:20180130:090521.670 item " -PP:ifOperStatus[Ten-GigabitEthernet2/0/42]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20761:20180130:090521.670 item " -PP:ifOutOctets[Ten-GigabitEthernet2/0/14]" became not supported: Cannot connect to "10.108.1.32:161": Authentication failure (incorrect password, community or key).
    20770:20180130:090551.567 SNMP agent item "ifOperStatus[InLoopBack0]" on host " -PP" failed: first network error, wait for 15 seconds

    Debug Level 4 log is to large to attach.

    Problems exist in either the dockerised version or the standalone install.
    Attached Files
  • kaspars.mednis
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2017
    • 349

    #2
    Hi,

    This is the first time i hear about the idea of using just only one poller on Zabbix server... its really strange, people uses hundreds of pollers usually without any issues, Zabbix is designed to use multiple concurrent pollers. I have polled multiple SNMPv3 devices without any issues.

    You problem may be the Timeout settings in zabbix_server conf , the default timeout of 3 seconds may be not enough for snmpv3. My suggestions, if you have the default timeout:

    - set back the poller count to 5
    - increase the timeout settings to 10 and restart the Zabbix server

    Regards,
    Kaspars

    Comment

    • steveroebuck
      Junior Member
      • Jan 2018
      • 19

      #3
      Hi Kasper

      Thanks for your reply, the issue we have is when we utilise SHA/AES authentication, the issue doesn't occur if we lower our security to MD5/DES, this can be seen the in the packet traces, when using multiple pollers none of the pollers send out the information encrypted with SHA/AES they send out using MD5/DES.

      A single poller sends out SHA/AES just fine.

      Comment

      • kaspars.mednis
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2017
        • 349

        #4
        Thanks for your feedback, that sounds like a bug, will check that

        Regards,
        Kaspars

        Comment

        • kaspars.mednis
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2017
          • 349

          #5
          just a few questions:

          - your exact Zabbix server version ?
          - are your snmpEngineIDs unique ?

          If monitoring SNMPv3 devices, make sure that msgAuthoritativeEngineID (also known as snmpEngineID or “Engine ID”) is never shared by two devices. According to RFC 2571 (section 3.1.1.1) it must be unique for each device.
          Kaspars

          Comment

          • sfl
            Junior Member
            • Jun 2016
            • 26

            #6
            Hi all,

            I'm experiencing same issue with snmp v3. Startpoller to 1 seems to solve the issue. Running Zabbix 3.4.6 package on ubuntu xenial.

            checked snmpengineid, all are different.

            I did not think it could come from SHA or AES cipher.

            @steveroebuck, do you use package or compiled zabbix version ?

            Sfl

            Comment

            • steveroebuck
              Junior Member
              • Jan 2018
              • 19

              #7
              Hi Guys

              Sorry for the delay in replying I have been on leave.

              @Kaspars

              Yes all devices have unique engine ID's and as previously stated the problem vanishes if you use either a single poller or swap out auth type to DES/MD5.

              We have had the issue with 3.4.2 and now 3.4.5

              @sfl we are using the dockerised deployment, but I have also stood up and deployed on stand alone VM's and had the same issue with 3.4.2 and 3.4.5. Official bug reports have led to many a wild goose chase. If you are experiencing the same might be worth having a look at some packet traces in wireshark with your AES/SHA credentials supplied, to see if you are getting the same auth failures and malformed packets out of zabbix that we are.

              Comment

              • steveroebuck
                Junior Member
                • Jan 2018
                • 19

                #8
                Anyone got any suggestions/advice or a fix for this issue, Zabbix themselves claim it's not a bug and I am having no luck with any bug fixes from them directly.

                Comment

                • steveroebuck
                  Junior Member
                  • Jan 2018
                  • 19

                  #9
                  We are still having issues with this exclusively when using AES/SHA on SNMP v3 Devices, Zabbix insist it's not a bug, but all my investigations point to it being a zabbix issue.

                  Comment

                  • Viks
                    Junior Member
                    • Mar 2018
                    • 24

                    #10
                    This is not a Zabbix bug, but your machine issue
                    This is a Violation of RFC 2571 (SNMP).

                    Please configure your device according to RFC and you will not have issues.

                    Comment

                    • steveroebuck
                      Junior Member
                      • Jan 2018
                      • 19

                      #11
                      If our "devices" are none standard (Trend Tipping Point, Checkpoint FW, F5 GTM and HPE 5900 series switches) can you explain why I can SNMPwalk then all fully, they have been working without issue in Solarwinds and they can all also be added without issue to LIbrenms, we only see the authentication issues once we add multiple devices, I have monitor a single HPE 5900 switch via a Zabbix proxy I get no issues, I can sometimes manage to monitor 2 devices, but soon as I go beyond this I start to get issues with ports going to "no supported". If my template or credentials were incorrect all ports would fail not just some seemingly at random.

                      If we are still going with the thought that we are not in compliance with RFC 2571 can you please explain to me how a single device will work fine, or if I utilise a single poller it also works fine.

                      If you look at the original packet captures you will see that when multiple pollers are used Zabbix is not sending out AES/SHA as requested it starts sending out packets with MD5/DES.

                      Comment

                      • Viks
                        Junior Member
                        • Mar 2018
                        • 24

                        #12
                        Can you specify which equipment / system / software (OS, model, version) have issues
                        and list each of them with a full engineID?

                        Comment

                        • steveroebuck
                          Junior Member
                          • Jan 2018
                          • 19

                          #13
                          As we have over 50 switches in our environment I have just listed a sample below

                          HPE 5900AF-48XG-4QSFP+ Switch - HPE Comware Software, Version 7.1.045 Release 2432P02 - Engine ID = 800063A280CC3E5F74B18300000001

                          HPE 6125XLG Blade Switch - HPE Comware Software, Version 7.1.045, Release 2432P01 - Engine ID = 800063A280CC3E5F95ED6400000001

                          HPE FF 5930-2Slot+2QSFP+Switch - HPE Comware Software, Version 7.1.045, Release 2432P02 Engine ID = 800063A2802C233A3EFA7700000001

                          HPE FF 5900CP-48XG-4QSFP+ - HPE Comware Software, Version 7.1.045, Feature 2427 - Engine ID = 800063A280443192353CAB00000001

                          HPE 6127XLG Ethernet Blade Switch - HPE Comware Software, Version 7.1.045, Release 2432P01 = Engine ID = 800063A280CC3E5FA06ED300000001

                          F5 GTM - BIG-IP 11.6.0 Build 6.43.442 Engineering Hotfix HF6 - Engine ID = 0x80001f88802c40a363219c8858

                          We have multiple of each of the above and we are experiencing the issues with them sporadically.

                          Comment

                          • Viks
                            Junior Member
                            • Mar 2018
                            • 24

                            #14
                            Simply collect the EngineID values from all your installations
                            and checking whether they are all realy unique and not repeating.

                            It can even be very easily done by creating metric item:
                            SNMP-FRAMEWORK-MIB::snmpEngineID.0
                            or
                            .1.3.6.1.6.3.10.2.1.1.0

                            Comment

                            • steveroebuck
                              Junior Member
                              • Jan 2018
                              • 19

                              #15
                              As we are still evaluation Zabbix as a solution I have scaled back what is being monitored to the following devices and I am experiencing the same issues with some ports being discovered correctly and others being "not supported" due to authentication issues.

                              HPE 5900AF-48XG-4QSFP+ Switch - HPE Comware Software, Version 7.1.045 Release 2432P02 - Engine ID = 800063A280CC3E5F74B18300000001
                              HPE 6125XLG Blade Switch - HPE Comware Software, Version 7.1.045, Release 2432P01 - Engine ID = 800063A280CC3E5F95ED6400000001
                              HPE FF 5930-2Slot+2QSFP+Switch - HPE Comware Software, Version 7.1.045, Release 2432P02 Engine ID = 800063A2802C233A3EFA7700000001
                              HPE 6127XLG Ethernet Blade Switch - HPE Comware Software, Version 7.1.045, Release 2432P01 = Engine ID = 800063A280CC3E5FA06ED300000001
                              HPE 5900AF-48XG-4QSFP+ Switch - HPE Comware Software, Version 7.1.045, Feature 2427 = Engine ID = 800063A280CC3E5F75A94F00000001
                              HPE 6125XLG Blade Switch - HPE Comware Software, Version 7.1.045, Release 2432P01 = Engine ID = 800063A280CC3E5F6C66DC00000001
                              F5 GTM - BIG-IP 11.6.0 Build 6.43.442 Engineering Hotfix HF6 - Engine ID = 0x80001f88802c40a363219c8858

                              These are the only devices currently being monitored via our Zabbix installation, all are using Authpriv with AES/SHA

                              Comment

                              Working...