Ad Widget

Collapse

SNMP timeout while connecting

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lobiDA
    Junior Member
    • Jul 2016
    • 19

    #1

    SNMP timeout while connecting

    Ubuntu trusty on ec2
    Zabbix server running 3.1.0

    Have two Juniper network devices; both configured the same for snmp.

    Juniper labeled megaman show red "timeout while connecting"; Juniper labeled pacman shows green. Verified configuration is also the same on Zabbix's end. Snmpwalk works.

    Changed timeout value to 30 and disabled bulk update on zabbix server and I still couldn't get the megaman Juniper to show green for SNMP, so I decided to delete the host and just re-create it.

    Still nothing.

    Config is 100% the same on both ends. Works for pacman, does not work for megaman.

    Logs don't show anything useful. As a matter of fact, the logs actually show a constant stream of reconnections to pacman, but absolutely nothing to megaman:

    Code:
    2532:20160721:221912.003 resuming SNMP agent checks on host "pacman": connection restored
      2532:20160721:221920.011 SNMP agent item "1.3.6.1.2.1.2.2.1.14.[589]" on host "pacman" failed: first network error, wait for 15 seconds
      2536:20160721:221934.203 cannot send list of active checks to "172.31.9.5": host [zabbix] not found
      2532:20160721:221935.062 resuming SNMP agent checks on host "pacman": connection restored
      2532:20160721:221943.073 SNMP agent item ".1.3.6.1.2.1.2.2.1.13.[564]" on host "pacman" failed: first network error, wait for 15 seconds
      2532:20160721:222006.096 SNMP agent item "1.3.6.1.2.1.2.2.1.19.[557]" on host "pacman" failed: another network error, wait for 15 seconds
      2532:20160721:222021.129 resuming SNMP agent checks on host "pacman": connection restored
    Edit: Completely forgot to state that I verified layer 3 connectivity via ping. Also, when I do an snmpwalk, the megaman Juniper DOES in fact send out details of itself, so that communication part is good. What is interesting is the following:

    Code:
    root@ip-172-31-9-5:~# telnet 10.1.15.2 161
    Trying 10.1.15.2...
    telnet: Unable to connect to remote host: Connection refused
    root@ip-172-31-9-5:~# telnet 10.1.15.1 161
    Trying 10.1.15.1...
    ^C
    root@ip-172-31-9-5:~#
    It shows connection refused to pacman (10.1.15.2) even though the SNMP connection shows green (and I am getting metrics).
    When I try it with megaman (10.1.15.1) I get a different response.
    Last edited by lobiDA; 22-07-2016, 22:06.
  • lobiDA
    Junior Member
    • Jul 2016
    • 19

    #2
    Confirmed 100% that snmpwalk is working by verifying that I get responses back, via tcpdump (had a suspicion that port 161 was being blocked on my Juniper megaman device).

    Also, noticed that the SNMP marker was showing green for a little, but now it is back to red (timeout). Again tried to change timeout value to 30 (From 4) on the zabbix server config, but no help there.

    I'm starting to think the red/green markers are not reliable due to a bug.

    Comment

    • lobiDA
      Junior Member
      • Jul 2016
      • 19

      #3
      Now I'm starting to think even the pacman Juniper device is actually not being monitored (and the data I'm looking at is either old or not partial):

      See this in the logs:

      Code:
       7030:20160725:193123.496 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:193223.544 SNMP agent item "RedAlarm" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:193423.650 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:193523.759 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:193623.801 SNMP agent item "RedAlarm" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:193639.334 resuming SNMP agent checks on host "pacman": connection restored
        7028:20160725:193706.778 SNMP agent item "1.3.6.1.2.1.2.2.1.14.[520]" on host "megaman" failed: first network error, wait for 15 seconds
        7030:20160725:193739.401 SNMP agent item "1.3.6.1.2.1.2.2.1.16.[534]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:193839.463 temporarily disabling SNMP agent checks on host "megaman": host unavailable
        7030:20160725:193939.513 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:194039.580 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:194139.618 SNMP agent item "1.3.6.1.2.1.2.2.1.16.[591]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:194339.703 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:194439.788 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:194539.845 SNMP agent item "1.3.6.1.2.1.2.2.1.16.[574]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:194739.949 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:194840.024 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:194940.087 SNMP agent item "1.3.6.1.2.1.2.2.1.19.[541]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:195140.176 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:195240.267 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:195340.330 SNMP agent item "1.3.6.1.2.1.2.2.1.19.[507]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:195540.424 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:195640.470 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:195740.533 SNMP agent item "1.3.6.1.2.1.2.2.1.14.[525]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:195940.620 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:200040.704 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:200140.761 SNMP agent item ".1.3.6.1.2.1.2.2.1.13.[548]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:200340.857 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:200440.953 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:200541.002 SNMP agent item "1.3.6.1.2.1.2.2.1.20.[508]" on host "pacman" failed: first network error, wait for 15 seconds
        7030:20160725:200741.128 temporarily disabling SNMP agent checks on host "pacman": host unavailable
        7030:20160725:200841.205 enabling SNMP agent checks on host "pacman": host became available
        7030:20160725:200941.251 SNMP agent item ".1.3.6.1.2.1.2.2.1.13.[594]" on host "pacman" failed: first network error, wait for 15 seconds
      When running zabbix on my servers using the ZBX agent, that seems to work fine, but this is not working for my Juniper devices. Unfortunately, the main reason for choosing Zabbix was for visibility into my network infrastructure

      Worried that I might have to scrap Zabbix and find something else.

      Please help!
      Last edited by lobiDA; 25-07-2016, 22:16. Reason: typo

      Comment

      • lobiDA
        Junior Member
        • Jul 2016
        • 19

        #4
        Figured out part of the problem. Some SNMP get-requests are using the proper community, but some are using the default "public". Changing the community globally via macro doesn't work, so instead I just changed the community string to public on my Juniper devices.

        Now, the megaman Juniper is responding with a GetResponse to the server's GetRequest, but data is still not being populated. The pacman Juniper isn't responding at all.

        This thread has been around now for a few days....really would appreciate it if someone can respond with some advice.
        Last edited by lobiDA; 27-07-2016, 23:48.

        Comment

        • lobiDA
          Junior Member
          • Jul 2016
          • 19

          #5
          Confirmed that the Juniper device is responding to snmpwalk via tcpdump with both getrequest and getresponse. Proof the the port is accessible, which rules out the juniper device not having this open.

          Looks like the problem here is definitely Zabbix. Tried deleting and re-adding hosts, still no help.

          Over 100 views on this thread and no one has responded...is there really no one here in the community that can help?

          Comment

          • lobiDA
            Junior Member
            • Jul 2016
            • 19

            #6
            Confirmed problem was due to wrong SNMP strings due to the template (both for traps from my Juniper devices as well as from Zabbix).

            General fix: Make everything community string public

            Specific fix for traps: Configure juniper trap-group for public
            Specific fix for Zabbix SNMP getRequests: Go into the Template's Macro's section, and make the value for "{$SNMP_COMMUNITY}" public, explicitly. Go to Administration -> General -> Dropdown box to Macros -> Change the Macro's "{$SNMP_COMMUNITY}" to public.

            Note that since I only plan to use SNMP for my two Juniper devices, both using templates from the same person, then I can change this Macro globally. I imagine that if you were using a more varied base of SNMP devices/templates, then this would not work for you.

            Hope this helps someone else.

            Comment

            • knickknackpadillac
              Junior Member
              • Sep 2019
              • 1

              #7
              Bro, thank you for responding to this thread even though no-one replied! I was struggling for days trying to figure out why my SNMP traps weren't working with my Juniper gear.

              Your thread helped me to get it working finally! I had the snmp community set to something other than 'public'.

              I'm using the Juniper SNMP v2 templates from Zabbix.

              Here are the settings from my SRX which worked for me:

              set snmp description srx320
              set snmp community public authorization read-only
              set snmp trap-options source-address <ip address>
              set snmp trap-group public version v2
              set snmp trap-group public categories authentication
              set snmp trap-group public categories chassis
              set snmp trap-group public categories link
              set snmp trap-group public categories startup
              set snmp trap-group public categories configuration
              set snmp trap-group public targets <ip address of zabbix server>

              Comment

              • acatic1
                Member
                • Oct 2019
                • 38

                #8
                For me it was that I had multiple templates applied to the switch, and apparently the 'global' community macro wasn't applying itself to each template, I had to manually edit it in each template. Once that was done, timeout went away immediately.

                Comment

                Working...