Ad Widget
Collapse
SNMP problems with Zabbix
Collapse
X
-
Well, after going through my logs from last night and watching them this morning, it seems my issue is not resolved. I still see the SNMP errors, followed immediately by the connection restored message, but it appears to be far less often and affects far less hosts. Disabling housekeeping helped, but didn't completely resolve the issues.Comment
-
I have re-enabled the housekeeping, it seems to be more stable now. I am still getting the SNMP item failed, followed by connection restored for 8-10 random hosts every so often. I am closer, but something still isn't right.
SNMP item [ifAdminStatus[IP11]] on host [ENODEDC601] failed: first network error, wait for 15 seconds
resuming SNMP checks on host [ENODEDC601]: connection restored
SNMP item [ifOperStatus[TenGigabitEthernet 0/32]] on host [EPCOTECT-FT01-01] failed: first network error, wait for 15 seconds
SNMP item [ifInOctets[TenGigabitEthernet 0/26]] on host [DDSMWDC6-FT01-01] failed: first network error, wait for 15 seconds
resuming SNMP checks on host [EPCOTECT-FT01-01]: connection restored
resuming SNMP checks on host [DDSMWDC6-FT01-01]: connection restored
SNMP item [ifOutErrors[Port5-GigabitEthernet]] on host [ENODE1180CP01] failed: first network error, wait for 15 seconds
SNMP item [ifAdminStatus[Ciena CN 3920 10/100/G 11]] on host [TEAMDISN-CN01-01] failed: first network error, wait for 15 seconds
SNMP item [ifInErrors[Port5-GigabitEthernet]] on host [ENODE1180CP01] failed: first network error, wait for 15 seconds
SNMP item [ifOutErrors[Port3-GigabitEthernet]] on host [ENODEPCT01] failed: first network error, wait for 15 seconds
SNMP item [ifInErrors[Port5-GigabitEthernet]] on host [ENODEDTDMANAQUIN01] failed: first network error, wait for 15 seconds
SNMP item [ifOutOctets[IP0]] on host [ENODEDAAR01] failed: first network error, wait for 15 seconds
SNMP item [ifInErrors[TenGigabitEthernet 0/47]] on host [CELHUB01-FT01-01] failed: first network error, wait for 15 seconds
SNMP item [ifInOctets[IP0]] on host [ENODEOSCLIBRARY01] failed: first network error, wait for 15 seconds
SNMP item [ifOutOctets[Port3-GigabitEthernet]] on host [ENODECSBC01] failed: first network error, wait for 15 seconds
NMP item [ifOutOctets[IP0]] on host [ENODE1170CELBV] failed: first network error, wait for 15 seconds
SNMP item [ifNumber] on host [ENODEC202] failed: first network error, wait for 15 seconds
SNMP item [ifOutOctets[Port3-GigabitEthernet]] on host [ENODEPERIDOT01] failed: first network error, wait for 15 seconds
SNMP item [ifOutErrors[remote]] on host [POPCENTU-CN01-01] failed: first network error, wait for 15 seconds
SNMP item [ifInOctets[TenGigabitEthernet 0/40]] on host [WCCSCTCO-FT01-01] failed: first network error, wait for 15 seconds
resuming SNMP checks on host [CELHUB01-FT01-01]: connection restored
resuming SNMP checks on host [ENODEDAAR01]: connection restored
resuming SNMP checks on host [TEAMDISN-CN01-01]: connection restored
resuming SNMP checks on host [ENODE1180CP01]: connection restored
resuming SNMP checks on host [ENODEDTDMANAQUIN01]: connection restored
resuming SNMP checks on host [ENODECSBC01]: connection restored
resuming SNMP checks on host [ENODE1170CELBV]: connection restored
resuming SNMP checks on host [ENODEC202]: connection restored
resuming SNMP checks on host [ENODEOSCLIBRARY01]: connection restored
resuming SNMP checks on host [ENODEPCT01]: connection restored
resuming SNMP checks on host [ENODEPERIDOT01]: connection restored
resuming SNMP checks on host [WCCSCTCO-FT01-01]: connection restored
resuming SNMP checks on host [POPCENTU-CN01-01]: connection restoredComment
-
Having this exact same issue. Note that this is happening in a freshly installed Zabbix 2.0.6, running the latest Debian on kernel 3.2.whatever. Using a fresh, clean MySQL database and only monitoring 2 hosts so far.
And it keeps looping like that, for all monitored hosts. Now, it doesn't seem to affect anything - data keeps coming in just fine. But it probably means SOMETHING is wrong, and I'd like to keep this log as clean as possible.Code:7391:20130626:150744.333 enabling SNMP checks on host [MyDumbHostname]: host became available snmp_build: unknown failure 7386:20130626:150844.349 SNMP item [0] on host [MyDumbHostname] failed: first network error, wait for 15 seconds snmp_build: unknown failure 7391:20130626:150859.345 SNMP item [0] on host [MyDumbHostname] failed: another network error, wait for 15 seconds snmp_build: unknown failure 7391:20130626:150914.351 SNMP item [0] on host [MyDumbHostname] failed: another network error, wait for 15 seconds snmp_build: unknown failure 7391:20130626:150929.352 SNMP item [0] on host [MyDumbHostname] failed: another network error, wait for 15 seconds snmp_build: unknown failure 7391:20130626:150944.356 temporarily disabling SNMP checks on host [MyDumbHostname]: host unavailable 7391:20130626:151144.373 enabling SNMP checks on host [MyDumbHostname]: host became available
Comment
-
I finally got everything working. I edited the template to increase the intervals for alias, description, operational status, etc. I left the traffic at 60 sec. I also made sure to start an unreachable poller for every poller I started. I have housekpeer running every hour. I also increased the DB syncers. I am no longer getting the SNMP errors in my logs unless there is actually an issue.Comment
-
Ooooo. I don't think I would do that. At one time I had Zabbix support review my settings and they strongly suggested I back down from having mine set at 16.
In fact, this is a quote from that review:
I think it's too many to have StartDBSyncers=16. The extra parallelism in DB writing depends on DB performance. It's rarely needed to increase default StartDBSyncers=4, and only then in big environments, with very good DB performance. I don't remember cases where we needed to setup more than 8 syncers.Comment
Comment