Ad Widget

Collapse

First network error, wait for 15 seconds (SNMP)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • TRNX
    Member
    • Oct 2019
    • 54

    #1

    First network error, wait for 15 seconds (SNMP)

    Hello.
    I have a little problem with SNMP. A few days back I create own template for MikroTik devices. Template is very simple (only 19 static items and some preprocessing in 2 items). I configured SNMPv3 in MikroTik devices and added this devices to Zabbix. I thought everything works good, because I received data from all these devices.
    But when I looked to the zabbix_server.log, I found many these errors:
    Code:
    SNMP agent item "used.hdd" on host "HOST A" failed: first network error, wait for 15 seconds
    resuming SNMP agent checks on host "HOST A": connection restored
    SNMP agent item "ros.time" on host "HOST B" failed: first network error, wait for 15 seconds
    resuming SNMP agent checks on host "HOST B": connection restored
    This problem is generated from more different than MikroTik devices (for example switches from different vendors). So I disabled all SNMP devices except MikroTik routers and I want to resolve first this devices.

    So I searched internet and found few recommendations for this problem:
    - increase Timeout in zabbix_server.conf (I set the value to 20.)
    - enable bulk requests (I had it enabled. I didn´t try to disable it.)
    - check if zabbix pollers aren´t overloaded (I checked it and all pollers looks good.)

    Next, I found this useful presentation: https://assets.zabbix.com/files/even...bix_SNMPv3.pdf
    Slide 17 - there are informations about EngineID, EngineBoots and EndigeTime:
    The SNMPv3 device needs to return the following values in accordance with RFC specification:
    Some of MikroTik routers response me EngineID, when I send SNMP request with OID: SNMP-FRAMEWORK-MIB::snmpEngineID.0, but most of routers don´t know about this OID.
    OID for EngineBoots and EngineTime don´t work for me in any MikroTik.
    Does it mean that MikroTik with SNMPv3 aren´t suitable for use with Zabbix? But as I said above, everything looks ok and I received all items. No one unsupported item.

    All of these MikroTik devices (and other SNMP devices I have in Zabbix) are in internet. As I said, I have problem with more devices than MikroTik, but I want resolve first MikroTik devices.

    I uploaded screenshot from zabbix_server.log and pollers graph to attachments.
    Thank for any help!
    Attached Files
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    Are clocks on devices in sync? For snmp v3 it is important as requests can be defined as non valid if time diffs are big... I think 150 sec... (rfc-3414, sections 2.2.3 and 2.3)
    Even if your device does not answer to your request for engineID etc (try with numeric value also .1.3.6.1.6.3.10.2.1.1.0) . It must use those values in answers to Zabbix requests, otherwise it is not even a proper snmpv3 implementation.. The could not even claim to have snmpv3 capability without this.. You should be able to see all of them in tcpdump..
    Verify all the credentials you are using, this is crucial to have no mistakes there. When changing creds, restart your proxy/server, whichever polls those devices. I think later versions have also snmp_cache_reload runtime option...
    Add both "pollers" and "unreachable pollers", see if it helps...

    Comment

    • TRNX
      Member
      • Oct 2019
      • 54

      #3
      Hello.
      1. Good point. I don´t have synced clock on MikroTik routers with specific NTP server. I use default clock function in MikroTik. But all routers has similar time as Zabbix server (the difference is only a few seconds - max 30s). But I will try to set the same NTP server on any of these devices.
      2. When I tried to use your numeric OID, it´s work. I swear I tried it yesterday and it didn´t work, but I must used bad OID or made another mistake. So it should be OK, every device has own unique EngineID.
      3. Credentials are ok. As I said, I receive data normally, only queue sometimes grows up (for few seconds) and in log there is some errors from my first post. Also I am able to do snmpwalk with same creds.
      4. StartPollers and UnreachablePollers I increased a few days back from default values to 50 (StartPollers) and 10 (UnreachablePollers)

      I am running Zabbix server 6.2.1.

      Thank you for your help

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        Try to increase logging level for pollers, maybe something floats up...

        Comment

        • TRNX
          Member
          • Oct 2019
          • 54

          #5
          Hello. Last friday I synchronized time on all MikroTik routers and some switches. After weekend, I don´t have any error from switches (maybe coincidence), but there are still a few errors from MikroTik.
          I am sorry, what is logging level for pollers? Do you mean DebugLevel?

          Comment

          • cyber
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Dec 2006
            • 4807

            #6
            You can increase debug level for specific processes / process types at runtime. Setting Debuglevel in config file will increase it for all, but you can do it even for one poller separately, or all pollers at once https://www.zabbix.com/documentation...server-process
            Same works for proxy processes also...

            Comment

            • TRNX
              Member
              • Oct 2019
              • 54

              #7
              Originally posted by TRNX
              After weekend, I don´t have any error from switches (maybe coincidence)
              It´s not true now, in the log I have switch errors too.


              However, thanks for explanation. I successfully increased logging level for all poller processes (to level 4).
              I see a little more information now. But I am not sure if I understand it. There is 2 images. At first I did snmpget and I received value. At second there is error with parse OID, but its the same item key and same host as the first image.

              I can upload and send you a full log, but I can´t publish it. There are sensitive data (customers names).
              Attached Files

              Comment

              • TRNX
                Member
                • Oct 2019
                • 54

                #8
                Originally posted by TRNX
                There is 2 images. At first I did snmpget and I received value. At second there is error with parse OID, but its the same item key and same host as the first image.
                When I replaced "string OIDs" with "numeric" in zabbix items, it´s starts work. So I resolved all not supported items from SNMP devices. I thought it should be a problem with SNMP errors in log, but no. Still receive some errors.
                I don´t know what more can I do for it.

                Thank you for your help.

                Comment

                • cyber
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Dec 2006
                  • 4807

                  #9
                  I'd suggest always use numeric OID-s, or you are always bound to install and update MIB files etc... https://www.zabbix.com/documentation...ypes/snmp/mibs
                  Doing manual snmpget usually is successful anyway, it only helps to understand, if connections work ok at all. Zabbix internals work their own way and just snmpget may not reflect whole situation...
                  To understand better, which item fails and what error is thrown, you can grep out all lines for specific pollers based on PID, which is first thing on each log line (in this case 1029, 970, 1044).... Those 4 lines on your second pic are actually from 3 different pollers and only first 2 are connected to each other, 3rd and 4th are not related to first 2..
                  You can also increase default Timeout in proxy/server config (whichever you are using here) and you can also try to disable bulk requests...
                  Last edited by cyber; 09-08-2022, 10:19.

                  Comment

                  • TRNX
                    Member
                    • Oct 2019
                    • 54

                    #10
                    Thank you very much.
                    In log, even if I increased logging level I don´t see any specific error. There are only some timeouts, but not reason.

                    However I disabled bulk request on NetGear switches (switches generated more "first netrowk errors" in log than MikroTik routers) and it was much worse. As you can see at the picture, the red rectangle is after I disabled bulk request. So it´s generated more errors without bulk request. I enabled bulk requests again.
                    My timeout was set to 20, but I increased it to 30 (max).

                    I must point out the switches which generates errors are out of my LAN. Switches in our LAN are ok.
                    Attached Files

                    Comment

                    • cyber
                      Senior Member
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Dec 2006
                      • 4807

                      #11
                      If its out of your reach, then it can easily be something in FW-s etc... But that should be investigated together with your friendly network admins.. Nothing I can add here..

                      Comment

                      • TRNX
                        Member
                        • Oct 2019
                        • 54

                        #12
                        Ok, thank you very much for your time and useful information. If I will resolve it, I will write where was a problem

                        Comment

                        Working...