ZABBIX Forums  
  #1  
Old 16-03-2012, 20:16
Minotaur Minotaur is offline
Junior Member
 
Join Date: Mar 2012
Posts: 2
Default Temporarily disabled SNMP checks

Hi,
I'm using Zabbix 1.8.10.
I have a problem with SNMP: one of my hosts was unavailable and Zabbix stopped to poll it:
Mar 16 19:55:01 noc Zabbix Server[1071]: SNMP item [upsInputVoltage.1] on host [UPS Emerson Liebert EX] failed: first network error, wait for 15 seconds
Mar 16 19:55:02 noc Zabbix Server[1073]: SNMP item [upsInputVoltage.2] on host [UPS Emerson Liebert EX] failed: another network error, wait for 15 seconds
Mar 16 19:55:03 noc Zabbix Server[1072]: SNMP item [upsInputVoltage.3] on host [UPS Emerson Liebert EX] failed: another network error, wait for 15 seconds
Mar 16 19:55:18 noc Zabbix Server[1075]: SNMP item [upsInputVoltage.1] on host [UPS Emerson Liebert EX] failed: another network error, wait for 15 seconds
Mar 16 19:55:34 noc Zabbix Server[1075]: SNMP item [upsInputVoltage.2] on host [UPS Emerson Liebert EX] failed: another network error, wait for 15 seconds
Mar 16 19:55:49 noc Zabbix Server[1075]: temporarily disabling SNMP checks on host [UPS Emerson Liebert EX]: host unavailable

More than one hour passed after host became available but zabbix didn't start polling it.
Deactivating and activating it via web-interface does not help.

Any ideas? Thanks in advance!
Reply With Quote
  #2  
Old 16-03-2012, 20:52
Minotaur Minotaur is offline
Junior Member
 
Join Date: Mar 2012
Posts: 2
Default

Sorry, it was my fault.

P.S. Could not find how to delete posted message ...
Reply With Quote
  #3  
Old 12-04-2012, 17:02
vapolise vapolise is offline
Junior Member
 
Join Date: Nov 2011
Posts: 6
Default

how did you resolve
Reply With Quote
  #4  
Old 13-04-2012, 04:16
Pada Pada is offline
Senior Member
 
Join Date: Apr 2012
Location: Stellenbosch, South Africa
Posts: 161
Default

I had an issue where I had 200+ SNMP items on a single host and for some unknown reason Zabbix 1.8.11 was unable to successfully monitor all the items. I kept on getting errors like "SNMP item [ifOutOctets13] on host [Cisco 877 ADSL Modem] failed: first network error, wait for 15 seconds", which of course resulted in the other items on that very same host being queued for longer than 10 minutes, and thus leaving massive gaps in my graphs.

The thing that bothered me was that snmpwalk NEVER had any issues querying my SNMP items, yet Zabbix failed to even go successfully through all the items 1 time.

I really tried everything that you can tweak for Zabbix and SNMP: using proxies, changing the StartPollers and Timeout in the config, etc.

Since I was monitoring so many items, I never bothered to carefully look at the zabbix_server.log file, which would've pointed out to me that it was only 3 items out of 200+, which had wrongly configured SNMP community or SNMP ports!

So once I've fixed my 3 SNMP items' community and port, these errors stopped and my Zabbix queue cleared up!

So my advice would be to double check those individual items, by running snmpget (part of yum's net-snmp-utils package) from the monitoring machine with the same parameters that you configured for the item. Make sure of the community and port too!

If that didn't help, then I would suggest that you start debugging Zabbix's load, by importing the Zabbix Server template that can be found at: http://blog.zabbix.com/monitoring-ho...esses-are/457/

I hope this is helpful!

Last edited by Pada; 13-04-2012 at 04:20.
Reply With Quote
  #5  
Old 14-04-2012, 16:02
vapolise vapolise is offline
Junior Member
 
Join Date: Nov 2011
Posts: 6
Default

Thanks. but zabbix able to monitor 2 out of 15 nodes successfully with snmpv3 authNoPriv MD5. Command line is working fine with out time outs. When it comes to the zabbix, time outs and monitoring only 2 nodes and stopped recieving any latest data for other 13 devices.
Reply With Quote
  #6  
Old 22-04-2012, 19:01
ericgearhart ericgearhart is offline
Senior Member
 
Join Date: Jan 2009
Posts: 114
Default

Please see the bug report I filed here: https://support.zabbix.com/browse/ZBX-4901

The symptoms I describe in my ticket seem to match up with the problem you guys are having
Reply With Quote
  #7  
Old 23-04-2012, 22:27
ericgearhart ericgearhart is offline
Senior Member
 
Join Date: Jan 2009
Posts: 114
Default

vapolise - check this out. I think I've found a hacky resolution for this problem.

The original discovery rules I had in place had different SNMPv3 'auth' and 'priv' passwords. I changed the passwords to line up, and SNMPv3 based discovery seems to be working now.

Secondly, in the Zabbix source tree, specifically in src/zabbix_server/poller/checks_snmp.c, in the get_snmp function I changed the code that returns NETWORK_ERROR to return NOTSUPPORTED. I know this is a hacky fix (if the device goes down I'm going to have queue backups) but I wanted to see if I could get SNMPv3 based low level discovery working.

This is the resulting code I now have running:
Code:
               else if (STAT_TIMEOUT == status)
                {
                        SET_MSG_RESULT(value, zbx_dsprintf(NULL, "get_snmp: Timeout while connecting to [[%s]:%d]",
                                        item->interface.addr, (int)item->interface.port));
                        /* ret = NETWORK_ERROR; */
                        ret = NOTSUPPORTED;
                }
Making my auth and priv SNMPv3 passwords match, and commenting out the NETWORK_ERROR return value and replacing it with NOTSUPPORTED in the get_snmp function has SNMPv3 low level discovery and polling working flawlessly for me on two Cisco 6509 switches with close to 2,000 (!) items
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:25.