Ad Widget

Collapse

Random SNMP network error after new iDrac update

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • g3ph4z
    Junior Member
    • May 2018
    • 7

    #1

    Random SNMP network error after new iDrac update

    Hello Guys!

    First of all, my english is really bad, so sorry for my english mistakes.

    I have 2 Dell servers, with iDrac 8. 3 weeks ago, I updated both to 2.52.52.52.
    And after that, the Zabbix started to lost connection for few minutes.

    log:

    HTML Code:
      6254:20180601:170611.182 SNMP agent item "NetStatus.[4]" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180601:170641.972 resuming SNMP agent checks on host "THESERVERNAME": connection restored
      6718:20180602:003901.117 SNMP agent item "MemoryEnum" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180602:003941.591 resuming SNMP agent checks on host "THESERVERNAME": connection restored
      6626:20180602:051611.871 SNMP agent item "GlobalSystemStatus" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180602:051641.750 resuming SNMP agent checks on host "THESERVERNAME": connection restored
      6806:20180602:121441.981 SNMP agent item "TempCritLowLimit.[2]" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180602:121511.022 resuming SNMP agent checks on host "THESERVERNAME": connection restored
      6302:20180602:202111.625 SNMP agent item "PowerUsageMinIdle" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180602:202211.769 resuming SNMP agent checks on host "THESERVERNAME": connection restored
      6849:20180602:202512.270 SNMP agent item "VoltageStatus.[5]" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180602:202541.324 resuming SNMP agent checks on host "THESERVERNAME": connection restored
      6653:20180602:231711.268 SNMP agent item "TempCritLowLimit.[1]" on host "THESERVERNAME" failed: first network error, wait for 15 seconds
      6849:20180602:231741.216 resuming SNMP agent checks on host "THESERVERNAME": connection restored
    but when Zabbix alert me "NO IDRAC DATA > 5 MINUTES" I checked manually with SNMPWALK and worked fine.

    Zabbix version: 3.4.9
    OS: CentOS
    DB: PostgreSQL

    UPDATE:
    I got call from Dell Support, and they told me, if we use snmpwalk, then I have to ask Zabbix support.....

    UPDATE2:
    I upgraded the snmpwalk FROM 5.7.2 to 5.7.3 from sourcecode it seems to be fine NOT fully, cause I still get "first network error" but it doesnt wait for 5-10 minutes and now I dont get any error.

    What should I try?
    Last edited by g3ph4z; 04-06-2018, 15:11.
  • Colttt
    Senior Member
    Zabbix Certified Specialist
    • Mar 2009
    • 878

    #2
    Hi,
    try to turn off bulk snmp.

    I also had found some iDRAC bugs in the past, for example if I try OID X and then OID y I got wrong values, I wrote a bug report to dell and after a while they fixed that.
    Debian-User

    Sorry for my bad english

    Comment

    • g3ph4z
      Junior Member
      • May 2018
      • 7

      #3
      Originally posted by Colttt
      Hi,
      try to turn off bulk snmp.

      I also had found some iDRAC bugs in the past, for example if I try OID X and then OID y I got wrong values, I wrote a bug report to dell and after a while they fixed that.
      Hi,

      thank you for ideas, Now I turned off the bulk requests, I hope it will solve the issue.

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        It is possible to reproduce those SNMP timeouts errors using plain net-snmp commands.
        Problem has been many times discussed on zabbix forums and the issue with net-snmp code running on SNMP agent size.
        You need to start chace Dell support. They need to start investigate those issues causes.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • g3ph4z
          Junior Member
          • May 2018
          • 7

          #5
          Originally posted by kloczek
          It is possible to reproduce those SNMP timeouts errors using plain net-snmp commands.
          Problem has been many times discussed on zabbix forums and the issue with net-snmp code running on SNMP agent size.
          You need to start chace Dell support. They need to start investigate those issues causes.
          I sent an email to Dell Support, I hope they can help us.
          BUT
          I called the our national Dell Support, and they told me I have to downgrade.

          So, my last chance is the regional Dell Support Team.

          Thank you for help.

          UPDATE:
          I got call from Dell Support, and they told me, if we use snmpwalk, then I have to ask Zabbix support.....
          Last edited by g3ph4z; 04-06-2018, 14:50.

          Comment

          • Colttt
            Senior Member
            Zabbix Certified Specialist
            • Mar 2009
            • 878

            #6
            when you stop the snmp queries (like disable the host) and wait a few minutes (eg 15-30min) and then enable it, does it work for the first minutes? if so did it again and run tcpdump until it happend, then stop it and take a look into it and what are the latest OID they try successfully and which one was the first htey didn't work. And then try this OID by yourself with snmpget, if that happen again it is a bug in the dell firmware
            Debian-User

            Sorry for my bad english

            Comment


            • g3ph4z
              g3ph4z commented
              Editing a comment
              Sorry for long delay. I will check this on the next week, and thank you
          • kloczek
            Senior Member
            • Jun 2006
            • 1771

            #7
            You know .. slow reply of the SNMP agent you can reproduce even with running snmpd on localhost.
            Really this issue has nothing to do with what is possible to observe on network layer.
            http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
            https://kloczek.wordpress.com/
            zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
            My zabbix templates https://github.com/kloczek/zabbix-templates

            Comment


            • g3ph4z
              g3ph4z commented
              Editing a comment
              I guess, but I cant increase the timeout time? (and thank you too)
          • glardz95
            Junior Member
            • Oct 2019
            • 14

            #8
            hello,

            do you resolve the issue, i have the same issue

            Comment

            • glardz95
              Junior Member
              • Oct 2019
              • 14

              #9
              how do you update snmpwalk FROM 5.7.2 to 5.7.3 ?

              Comment

              Working...