Ad Widget

Collapse

Giving up on Zabbix (unfortunately when SNMP is involved)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • innovot
    Junior Member
    • Nov 2013
    • 15

    #1

    Giving up on Zabbix (unfortunately when SNMP is involved)

    Working on a site that am only able to use SNMP for monitoring and constantly get timeouts and non discovery from Zabbix. In total 200 nodes were being monitored, 80 were discovered, and then huge breaks in graphs etc.

    This was being performed in parallel with OpenNMS Horizon which has not skipped a beat and is reporting statistics as expected.

    What has been missed in the Zabbix configuration which would cause such differences please ?

    Timeout=30
    StartPollers=200
    StartPollersUnreachable=50
    StartPingers=20
    StartDiscoverers=50
    CacheSize=2G
    StartDBSyncers=20
    HistoryCacheSize=32M

    Any thoughts please ?
  • xxiii
    Junior Member
    • Jun 2013
    • 28

    #2
    Have you tried disabling "use bulk requests" (in the host configuration)?

    I think I've found a problem with bulk requests and I'm curious if this solves your problem (at least for existing hosts with gaps in their graphs).

    Also, try decreasing the timeout, especially if snmpwalks always return promptly. Look at your "zabbix server" graphs and see how busy your pollers (and other zabbix items) are. With a shorter timeout you can get by with a lot fewer pollers, Unless you have devices that really take 30 seconds to respond.

    Comment

    • dirckcopeland
      Member
      • Oct 2013
      • 50

      #3
      innovot,
      you may also look at the interval on the items, if they are set to 30 seconds, and 200 hosts have lots of SNMP items doing SNMP queries every 30 seconds, it may be overwhelming the polling. Back it off to a 60 second interval on all the items and see if that helps.

      Comment

      • Linwood
        Senior Member
        • Dec 2013
        • 398

        #4
        I just had a pile of ASA's that would not work reliably and dug in and found that the template I was using had hard coded "public" in it in a few (but not all) places and was not using the {$SNMP_COMMUNITY} macro. Since most devices respond to a bad community string with a timeout, this can be confusing. And since it was only some triggers not all, it did not fail to discover or work, just would fail intermittently.

        Comment

        Working...