Hello
I have a problem with SNMP. This is actually not a Zabbix problem, but as it causes errors in Zabbix too (missing data), I thought that may be someone around here can help me a bit.
I have a lot of SNMP devices and mostly the monitoring works fine. Still I have few hosts that Zabbix fails to collect data from time to time. As data is still received few times per hour, I thought about network or the host itself being overloaded. So I did tcpdump to check for networking issues and this is where I got confused. I can see that Zabbix proxy sends "get request" and receives response, but still shows "timeout" in snmpwalk. For example:
Bit of tcpdump at the same time:
Situation is same when using regular snmpwalk instead of bulk one.
Further debugging led me to debug SNMP itself. Unfortunately I cannot understand most of the output, but difference I found in failed (timeout) response are those lines (first 8 lines repeat every time before those five retries):
Can someone point me out what else to check or what might be the problem?
Raido
I have a problem with SNMP. This is actually not a Zabbix problem, but as it causes errors in Zabbix too (missing data), I thought that may be someone around here can help me a bit.
I have a lot of SNMP devices and mostly the monitoring works fine. Still I have few hosts that Zabbix fails to collect data from time to time. As data is still received few times per hour, I thought about network or the host itself being overloaded. So I did tcpdump to check for networking issues and this is where I got confused. I can see that Zabbix proxy sends "get request" and receives response, but still shows "timeout" in snmpwalk. For example:
Code:
zabbix-proxy.mgmt:~ $ snmpbulkwalk -v 3 -u user -a MD5 -A auth -l authNoPriv host-x.mgmt .1.3.6.1.2.1.31.1.1.1.10 Timeout: No Response from host-x.mgmt
Code:
zabbix-proxy.mgmt:~ $ tcpdump -r /tmp/snmpdebug.pcap | grep 31.1.1.1.10 reading from file /tmp/snmpdebug.pcap, link-type LINUX_SLL (Linux cooked) 14:16:10.780974 IP zabbix-proxy.mgmt.46534 > host-x.mgmt.snmp: F=ar U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetBulk(30) N=0 M=10 31.1.1.1.10 14:16:12.782931 IP zabbix-proxy.mgmt.46534 > host-x.mgmt.snmp: F=ar U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetBulk(30) N=0 M=10 31.1.1.1.10 14:16:12.919682 IP host-x.mgmt.snmp > zabbix-proxy.mgmt.46534: F=a U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetResponse(263) 31.1.1.1.10.2134639363=38532361352997 31.1.1.1.10.2134639364=22003841887865 31.1.1.1.10.2134639490=0 31.1.1.1.10.2134639491=149588147286 31.1.1.1.10.2134647780=83613246992568 31.1.1.1.10.2134647781=98246014871898 31.1.1.1.10.2134672258=369772994858 31.1.1.1.10.2146632833=0 31.1.1.1.10.2146632834=0 31.1.1.1.10.2147238657=0 14:16:14.783941 IP zabbix-proxy.mgmt.46534 > host-x.mgmt.snmp: F=ar U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetBulk(30) N=0 M=10 31.1.1.1.10 14:16:14.863256 IP host-x.mgmt.snmp > zabbix-proxy.mgmt.46534: F=a U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetResponse(263) 31.1.1.1.10.2134639363=38532361352997 31.1.1.1.10.2134639364=22003841887865 31.1.1.1.10.2134639490=0 31.1.1.1.10.2134639491=149588147286 31.1.1.1.10.2134647780=83613246992568 31.1.1.1.10.2134647781=98246014871898 31.1.1.1.10.2134672258=369773004133 31.1.1.1.10.2146632833=0 31.1.1.1.10.2146632834=0 31.1.1.1.10.2147238657=0 14:16:16.784938 IP zabbix-proxy.mgmt.46534 > host-x.mgmt.snmp: F=ar U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetBulk(30) N=0 M=10 31.1.1.1.10 14:16:16.918551 IP host-x.mgmt.snmp > zabbix-proxy.mgmt.46534: F=a U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetResponse(263) 31.1.1.1.10.2134639363=38532363159218 31.1.1.1.10.2134639364=22003846646776 31.1.1.1.10.2134639490=0 31.1.1.1.10.2134639491=149588148986 31.1.1.1.10.2134647780=83613258103748 31.1.1.1.10.2134647781=98246019891033 31.1.1.1.10.2134672258=369773011016 31.1.1.1.10.2146632833=0 31.1.1.1.10.2146632834=0 31.1.1.1.10.2147238657=0 14:16:18.786899 IP zabbix-proxy.mgmt.46534 > host-x.mgmt.snmp: F=ar U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetBulk(30) N=0 M=10 31.1.1.1.10 14:16:18.874084 IP host-x.mgmt.snmp > zabbix-proxy.mgmt.46534: F=a U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetResponse(263) 31.1.1.1.10.2134639363=38532364749243 31.1.1.1.10.2134639364=22003846646776 31.1.1.1.10.2134639490=0 31.1.1.1.10.2134639491=149588148986 31.1.1.1.10.2134647780=83613258103748 31.1.1.1.10.2134647781=98246019891033 31.1.1.1.10.2134672258=369773019373 31.1.1.1.10.2146632833=0 31.1.1.1.10.2146632834=0 31.1.1.1.10.2147238657=0 14:16:20.801928 IP zabbix-proxy.mgmt.46534 > host-x.mgmt.snmp: F=ar U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetBulk(30) N=0 M=10 31.1.1.1.10 14:16:20.939664 IP host-x.mgmt.snmp > zabbix-proxy.mgmt.46534: F=a U="user" E=_00_00_90_55_ae_0e_ad_01_80_00_00_c1 C="" GetResponse(263) 31.1.1.1.10.2134639363=38532366479595 31.1.1.1.10.2134639364=22003851364591 31.1.1.1.10.2134639490=0 31.1.1.1.10.2134639491=149588149894 31.1.1.1.10.2134647780=83613269580091 31.1.1.1.10.2134647781=98246024927683 31.1.1.1.10.2134672258=369773025972 31.1.1.1.10.2146632833=0 31.1.1.1.10.2146632834=0 31.1.1.1.10.2147238657=0
Further debugging led me to debug SNMP itself. Unfortunately I cannot understand most of the output, but difference I found in failed (timeout) response are those lines (first 8 lines repeat every time before those five retries):
Code:
trace: _sess_process_packet(): snmp_api.c, 5314: sess_process_packet: unmatched msg id: 234799250 != 234799249 trace: _sess_process_packet(): snmp_api.c, 5469: sess_process_packet: unhandled PDU trace: snmp_sess_select_info2_flags(): snmp_api.c, 6041: sess_select: for all sessions: 3 (to in 780937.066682 sec) verbose:sess_select: timer due in 1.910358 sec verbose:sess_select: setting timer to 1.910358 sec, clear block (was 1) trace: snmp_synch_input(): snmp_client.c, 183: snmp_synch: Response (ReqID: 811951478 - Cmd 165) Timeout: No Response from host-x.mgmt
Can someone point me out what else to check or what might be the problem?
Raido
Comment