Ad Widget

Collapse

SNMP agent item "....[540]" on host "xxxx" failed: first network error, wait for 15

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Mechanix
    Member
    • Jan 2017
    • 92

    #1

    SNMP agent item "....[540]" on host "xxxx" failed: first network error, wait for 15

    Hi,

    my environment:

    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=16.04
    DISTRIB_CODENAME=xenial
    DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
    Zabbix: 3.2
    VM: 3xCPU ; 4GB RAM; 50GB HDD
    Every few seconds I get following errors in the log:

    12692:20170120:084514.747 SNMP agent item "1.3.6.1.2.1.2.2.1.20.[515]" on host "XXXXXX" failed: first network error, wait for 15 seconds
    12837:20170120:084529.042 resuming SNMP agent checks on host "XXXXX": connection restored
    I have found the interface on the device with:

    show interfaces snmp-index 515
    but the graphs do not show any gaps or irregularities. Does anybody know what this errors are about?

    Thank you.
  • Pada
    Senior Member
    • Apr 2012
    • 236

    #2
    Ensure that the SNMP version, community/credentials and port is correctly configured in Zabbix for the item and that you also added an SNMP interface to the host.

    I'd also suggest that you first try to obtain the values for that OID using snmpget or snmpwalk.
    eg.
    Code:
    snmpwalk -v <SNMP version> -c <community> <ip>:<port> <oid>
    Code:
    snmpwalk -v 2c -c public 192.168.2.3:161 1.3.6.1.2.1.2.2.1.20

    Comment

    • Mechanix
      Member
      • Jan 2017
      • 92

      #3
      Hi,

      yes I can query the device from the CLI and it gives me the correct interface. I´m using snmpv3.

      Comment

      • Pada
        Senior Member
        • Apr 2012
        • 236

        #4
        I unfortunately haven't worked with SNMP v3 yet.

        Perhaps try to disable all the other items on that host and then see if you can at least obtain values for that particular OID.
        Like we've had issues when using JMX that when one of the other items had incorrect credentials, it resulted many more going into an unsupported state from time to time.

        If this doesn't solve your issue, then someone else would need to help you.

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Common cause of those SNMP fails is not enough processing power on SNMP agent side.
          Many devices with embedded SNMP agent has so weak CPU that sending SNMP multivalue query takes so much time that whole query fails on timeout.
          Usually SNMP agent software bases on net-snmp snmpd and I think that generally SNMP agent based on this code if far from provide max possible speed of sampling internally and sending back data over SNMP protocol.

          What I can suggest is prepare some snmpget multiple OIDs query or snmpwalk query and run it in loop measuring time of the 100-1000 such queries to check distribution of time those queries. Theoretically spread of the queries time should be quite narrow.
          If it will be not narrow such test and results could be IMO enough good to raise support ticket against SNMP agent performance which should be send to support team supporting device software.
          Sometimes monitored device provides full terminal access to embedded system (it is like this for example in case NetApp devices) over telnet/ssh were is possible to execute even strace/truss command against running snmpd process to observe what exactly is slowing down sending data to SNMP client.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • Mechanix
            Member
            • Jan 2017
            • 92

            #6
            The overall CPU utilization of the devices are around 20-40% .
            If I disable snmpbulk on the devices the cpu utilization goes up to 80% and I it didn´t get rid of the error messages in the logs.

            These are the server settings right now:

            StartPollers=25
            StartPollersUnreachable=5
            StartPingers=15
            StartDiscoverers=15
            CacheSize=32M
            Timeout=30
            But I also tried different settings (80 pollers etc)

            Comment

            • Mechanix
              Member
              • Jan 2017
              • 92

              #7
              Any hints what could cause this issue? Thanks

              Comment

              • sfl
                Junior Member
                • Jun 2016
                • 26

                #8
                I have the same issue zabbix 3.2.4 and snmp v3 on ubuntu xenial

                unchecked bulk request solve unsupported item behaviour but I have still "first network error, wait for 15 seconds"

                graph seems correct (without blank value)

                Comment

                • Mechanix
                  Member
                  • Jan 2017
                  • 92

                  #9
                  Originally posted by sfl
                  I have the same issue zabbix 3.2.4 and snmp v3 on ubuntu xenial

                  unchecked bulk request solve unsupported item behaviour but I have still "first network error, wait for 15 seconds"

                  graph seems correct (without blank value)
                  That is the exact behaviour I´m facing.

                  Comment

                  • neo32
                    Senior Member
                    • Nov 2013
                    • 149

                    #10
                    Hello! Im using zbx 3.2.6 and i have a similar problem.
                    snmpv2 a working good, but snmpv3 get upper error.
                    Someone know issue?

                    Comment

                    • neo32
                      Senior Member
                      • Nov 2013
                      • 149

                      #11
                      Noone did not encounter similar trouble?

                      Comment

                      • woo
                        Junior Member
                        • May 2017
                        • 3

                        #12
                        I'm getting hundreds of "first network error" messages on hosts where the network etc is completely fine, but mine are not related to SNMP, but for regular agent checks. There seems to be something broken in the connection management since Zabbix 3.2

                        Comment

                        • Nagainos
                          Member
                          • Oct 2016
                          • 46

                          #13
                          Originally posted by woo
                          I'm getting hundreds of "first network error" messages on hosts where the network etc is completely fine, but mine are not related to SNMP, but for regular agent checks. There seems to be something broken in the connection management since Zabbix 3.2
                          I think you right. At large installation (2200 devices, 1900 nvps) I see randomly broken charts at some devices. Enable/Disable bulk mode not give a result. But installed Nagios at neighbor VM not showing any trouble with data channel or cpu utilization with "troubled" devices.

                          I'm tryed to debug that issue, but I don't understandign how need work with GDB at multithread apps.

                          Mabye in Zabbix 4.0 will get resolve that. =)

                          Comment

                          • metal
                            Member
                            • Nov 2019
                            • 42

                            #14
                            I am having same issue as well, changing pollers, discoverers, etc... did never solve the issue for me even on v4.4.3
                            It seems Zabbix is suffering with SNMPv3, I tried Nagios and I m not facing these issues. I also checked NTP settings on all devices, credentials, etc.. the problems seems to be going forth and back with all devices...
                            Last edited by metal; 29-11-2019, 17:43.

                            Comment

                            • glardz95
                              Junior Member
                              • Oct 2019
                              • 14

                              #15
                              i have the same issue

                              Comment

                              Working...