Ad Widget

**zychonatic** · 18-02-2011, 10:27

hi,

i´ve got the same problem.

any solution?

br zychonatic

**fmrapid** · 18-02-2011, 19:22

pre 1.8.5 (the cvs builds) do include a fix for something related. SNMP queries doubled or delayed.

I suggest you see if this fixes the problem or open a support request.

Have you taken a wireshark packet trace to see if data is indeed being requested and received from the problematic host with out any errors.

Cheers

fmrapid

**bonobo_slr** · 21-02-2011, 09:59

I am noticing the same issue. I have devices that I poll via snmp - they are not experiencing load at all. I am using zabbix and xymon to monitor the load/interface traffic etc.

Xymon seems to have no problems with the data collection but Zabbix is erratic and therefore reporting incorrectly for items that require the delta between values.

The devices are on the same VLAN as the zabbix server, plugged into the same switch, so I am ruling out a network issue. The problem is not confined to one device - it is many.

Some tips on how to trouble shoot this would be appreciated.

**untergeek** · 24-02-2011, 20:49

Agreed. From what I've read, the problem I'm seeing may or may not be fixed by the 1.8.5 update. I wish there were some other way to get info, as debuglevel=4 is WAY more info than I need.

**dminstrel** · 02-03-2011, 21:49

Is there a JIRA ticket for this? I'm also having this problem on 1.8.5rc1.

Thanks,

**ericgearhart** · 21-04-2012, 02:20

Did anyone find a resolution to this? All the symptoms mentioned here sound eerily similar to issues I'm currently having with SNMP in Zabbix

I'm wondering if rolling back the changes that were made to checks_snmp.c and poller.c to close this ticket: https://support.zabbix.com/browse/ZBX-4026 might resolve this. I'll give it a shot.

See http://git.zabbixzone.com/trunk/.git...d02726405800d8 for the related git commit

**ericgearhart** · 22-04-2012, 17:54

Please see https://support.zabbix.com/browse/ZBX-4901 for the bug report that seems to match the symptoms described in this thread

**PhilSynek** · 06-11-2014, 12:03

Hi Everyone,

I am experiencing the same problem. I am monitoring 5 appliances from the same manufacturer via a Zabbix proxy 2.2.6 (same as server) with SNMPv3. Four of them are the exact same model, the fifth one is a different appliance and this is the one causing problems. Let’s call them the four “switches” and the fifth one “router”, just for easier explanation.

First we were monitoring only the router. No problems so far, everything worked like a charm. LLD of SNMP Items was working fine.
Then one day (no changes were made according the Zabbix audit) the queue started to fill up with almost 200 items. The graphs started to print only sporadic lines. And the zabbix_proxy.log filled up with lines like these:

Code:

26158:20141106:100010.785 SNMP agent item "ifOperStatus[GigabitEthernet-10]" on host "Router" failed: first network error, wait for 15 seconds
26161:20141106:100025.092 resuming SNMP agent checks on host "Router": connection restored
26151:20141106:100040.871 SNMP agent item "ifHCInUcastPkts[GigabitEthernet-5]" on host "Router" failed: first network error, wait for 15 seconds
26161:20141106:100055.118 resuming SNMP agent checks on host "Router": connection restored
26154:20141106:100110.473 SNMP agent item "ifHCInOctets[GigabitEthernet-3]" on host "Router" failed: first network error, wait for 15 seconds
26161:20141106:100125.154 resuming SNMP agent checks on host "Router": connection restored
26151:20141106:100140.675 SNMP agent item "ifHCInOctets[GigabitEthernet-12]" on host "Router" failed: first network error, wait for 15 seconds

I knew that problem from another company I worked at, so I started to check the items, discovery rules and item prototypes of the template for any wrong configuration (additional dot in front of the OIDs, wrong port or security settings, etc.) I could not find anything. So I started to look for changes made to systems involved. I found out, that the kernel of the proxy got downgraded. Just to be sure, the proxy was reinstalled, with the right kernel from beginning. Unfortunately this didn’t solve the problem.

In the meanwhile the four switches were installed and monitoring started. Same Interface Template as used for the router. Only LLD for the switches is the same as for the router, the Interface Discovery based on standard MIB OIDs. Everything is working fine for the switches!

We are using two templates in this scenario. “Template switch” and “Template router”, both are linked to our “Template Network Interfaces SNMPv3”. And the items from this network template are the only ones popping up in the logs.

The authentication and security name are controlled via macros and set in all hosts. Only difference between the switches and the router is, that the used authentication and security names in the switches are not identical. The router is using identical strings for authentication and security. But I can’t believe this should be the problem, because it worked this way before.

SNMPWALKS performed from the zabbix proxy are working flawless. I checked every item in the router template, every discovery rule and every item prototype directly via SNMPWALK, works without problems.

I hope someone can help me.
Thanks!
Philipp

**tchjts1** · 07-11-2014, 01:40

I would double check that your SNMPV3 username and security strings are correct.

I just went through this same scenario you are describing. Took days to figure it out. One of our NetApp devices was upgraded to SNMPv3, but they failed to create an SNMPv3 user and passphrase.

All of the above are worth checking. Also worth drilling down to the items for your problem device and checking to see if somehow the passphrase or securityname inherited the wrong macro or string. Also make sure to validate you have the correct securitylevel applied for the items.

Outside of that, have a look at your Zabbix internal process health. Make sure you have enough resources allocated. See the last paragraph of this post as well as the graphs that follow it:https://www.zabbix.com/forum/showthread.php?t=41219

**PhilSynek** · 07-11-2014, 11:33

Hi!

Thank you for the answer. Again, I checked all the items for the correct

Type = SNMPv3 agent
Security name = {$SNMP_SECURITY}
Security level
Authentication protocol
Authentication passphrase = {$SNMP_AUTHENTICATION}
Port = empty (Port ist set via Host)

Everything was correct. I double checked the macros, they are alright. I also checked the Zabbix internals before and now as you mentioned them. I already raised my pollers on the server and the proxy.

Just to be sure, see the attachement for screenshots.

I turned on log debugging on the proxy and here is what I found:

Code:

16335:20141107:100240.499 In zbx_snmp_get_values() num:94 level:0
16335:20141107:100240.500 zbx_snmp_get_values() snmp_synch_response() status:1 errstat:-1 mapping_num:94
16335:20141107:100240.500 End of zbx_snmp_get_values():NETWORK_ERROR
16335:20141107:100240.500 End of zbx_snmp_process_standard():NETWORK_ERROR
16335:20141107:100240.500 In zbx_snmp_close_session()
16335:20141107:100240.500 End of zbx_snmp_close_session()
16335:20141107:100240.500 getting SNMP values failed: Cannot connect to "10.255.242.95:161": Too long.
16335:20141107:100240.500 End of get_values_snmp()
16335:20141107:100240.500 In deactivate_host() hostid:10286 itemid:39984 type:6
16335:20141107:100240.501 query [txnlev:1] [begin;]
16335:20141107:100240.501 query [txnlev:1] [update hosts set snmp_errors_from=1415350960,snmp_disable_until=1415350975,snmp_error='Cannot connect to "10.255.242.95:161": Too long.' where hostid=10286]
16335:20141107:100240.501 query [txnlev:1] [commit;]
16335:20141107:100240.503 SNMP agent item "ifHCOutOctets[GigabitEthernet-2]" on host "router" failed: first network error, wait for 15 seconds
16335:20141107:100240.503 deactivate_host() errors_from:1415350960 available:1
16335:20141107:100240.503 End of deactivate_host()

Does "getting SNMP values failed: Cannot connect to "10.255.242.95:161": Too long." means, that the proxy is reaching some kind of timeout?

Thank you!
Philipp

Attached Files

**tchjts1** · 07-11-2014, 19:21

A few things here.

You mentioned you raised your pollers. Did you increase your unreachable pollers also? If not, I would bump those up. I would also allocate some more configuration cache. You are not hitting the alert threshold, but I personally don't like to run that close.

As for your timeout. No, that doesn't necessarily mean that you can't reach the device. You would also get that error if there was something wrong with the credentials. I know you have double-checked the settings in the Zabbix frontend. Have you also checked with your SNMP Admin to make sure you are matching what they have set in SNMPv3 on that device?

I would explicitly tell him/her what you are using for username, passphrase and security level and ask them to validate that they are an exact match.

Also - Any chance you can point that device directly to the Zabbix server to take the proxy out of the mix for troubleshooting purposes?

Lastly, there is a Timeout= setting on your Zabbix server in zabbix_server.conf. By default, it is set to 3. I always make it a point to set that to at least 15. (restart Zabbix server process any time you make conf changes)

**PhilSynek** · 07-11-2014, 23:06

Thanks a lot for your time and support! I really appreciate that.

Originally posted by tchjts1

You mentioned you raised your pollers. Did you increase your unreachable pollers also? If not, I would bump those up. I would also allocate some more configuration cache. You are not hitting the alert threshold, but I personally don't like to run that close.

No I didn't raise the unreachable pollers. What do you suggest?

My settings on the server:

StartPollers=15
StartPollersUnreachable=1
StartPingers=15
StartDiscoverers=15
CacheSize=8M
Timeout=5

My settings on the proxy:

StartPollers=10
StartPollersUnreachable=1
StartPingers=5
StartDiscoverers=5
CacheSize=8M
Timeout=3

Originally posted by tchjts1

As for your timeout. No, that doesn't necessarily mean that you can't reach the device. You would also get that error if there was something wrong with the credentials. I know you have double-checked the settings in the Zabbix frontend. Have you also checked with your SNMP Admin to make sure you are matching what they have set in SNMPv3 on that device?

I would explicitly tell him/her what you are using for username, passphrase and security level and ask them to validate that they are an exact match.

I snmpwalked the host directly from my zabbix proxy shell. Everything is fine from there. I get values every time. So I guess that's not the solution for my problem, right?

Originally posted by tchjts1

Also - Any chance you can point that device directly to the Zabbix server to take the proxy out of the mix for troubleshooting purposes?

Unfortunately not. The devices network segment is a not routed management network. I could try to get a temporary management interface on the zabbix server, but first I would like to eliminate all the other possibilities. Let's keep that in mind.

Originally posted by tchjts1

Lastly, there is a Timeout= setting on your Zabbix server in zabbix_server.conf. By default, it is set to 3. I always make it a point to set that to at least 15. (restart Zabbix server process any time you make conf changes)

See my configuration above. Should I increase both, server and proxy, timeouts or only the server timeout?

Again, thank you for the help!
Philipp

**tchjts1** · 08-11-2014, 23:15

Without knowing what your new values per second are, how many devices you are monitoring, the number of items and triggers - I am going strictly by your internal process graphs and my personal experience.

Since your proxy and server are both fairly close to the stock default settings, I would make them both have these settings. I show the changed lines with <--. I'm sure you know that if you change any values in the conf, you either have to remove the comment (#) at the beginning of the line or put a new line without the comment. Otherwise it will still use default settings.

On the server:

Code:

    StartPollers=25   <---
    StartPollersUnreachable=5   <---
    StartPingers=15
    StartDiscoverers=15
    CacheSize=32M    <---
    Timeout=15   <---

On the proxy:

    StartPollers=25   <---
    [B]StartPollersUnreachable=5   <---
    StartPingers=5
    StartDiscoverers=5
    CacheSize=32M    <---
    Timeout=15    <---

The above changes will probably not solve your SNMP issue, but your setup is going to run more smoothly.
Those above settings are somewhat liberal, but it gives you some room for growth without having to adjust these settings if you add in a few more hosts/devices or items/triggers.

**PhilSynek** · 20-11-2014, 14:44

tchjts1, thank you for your help! I adjusted the config as you suggested. You were right, it didn't help with my SNMP problem, but the zabbix performance looks better now.

Our network pro took a look at the zabbix proxy (tcpdump) and recognized, that some SNMP packages weren't send from the proxy. They appeared in the debug logs, as I already saw too, but did not appear in tcpdump, which I did check, but did not look close enough. I did not expect the proxy to send only a few SNMP requests. I thought: If one goes out, all go out.

tl;dr:
The solution is to upgrade the zabbix proxy to version 2.2.7 and deactivate snmpbulk.

Cheers,
Philipp

Ad Widget

Inconsistent SNMP, and timeouts

Inconsistent SNMP, and timeouts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment