Ad Widget

**Libiana** · 11-01-2018, 11:14

Hello,

this topic might be a mix of few different problems, but at this point I'm not sure how to separate it. About 2 weeks ago I've started receiving repeatedly alerts about busy poller processes. Checking graph, it's basically at 97-100% all the time, with occasional drops to 70%. Also I'm noticing unreachable pollers around 65-80% busy, which was already lowered from 80-90%. I've already tried upgrading server (using now 3.4.5 which seems to have problem with agent connection after restarting server, which was mentioned in other topic here) and modifying some of the server parameters. Server has got more resources than it can use, database is located on separate machineThat's how it looks now:

Code:

StartPollers=750
StartIPMIPollers=1
StartPollersUnreachable=80
StartTrappers=20
StartPingers=96
StartDiscoverers=40
StartHTTPPollers=10
StartTimers=40
StartEscalators=1
StartVMwareCollectors=5
VMwareFrequency=600
VMwarePerfFrequency=1800
VMwareCacheSize=16M
VMwareTimeout=10
StartSNMPTrapper=1
SenderFrequency=30
CacheSize=1024M
CacheUpdateFrequency=60
StartDBSyncers=20
HistoryCacheSize=128M
HistoryIndexCacheSize=128M
TrendCacheSize=64M
ValueCacheSize=128M
Timeout=10
TrapperTimeout=300
UnreachablePeriod=120
UnavailableDelay=60
UnreachableDelay=15
StartProxyPollers=5
ProxyConfigFrequency=3600
ProxyDataFrequency=1

I've tried looking through server log and it's basically full of messages:

Code:

 12323:20180111:095228.800 SNMP agent item "ifInOctets.[ge-0/0/24]" on host "VC_PPD-1-1" failed: first network error, wait for 15 seconds
 12810:20180111:095228.813 resuming SNMP agent checks on host "Switch A": connection restored
 12813:20180111:095228.816 resuming SNMP agent checks on host "Switch B": connection restored
 12804:20180111:095228.817 resuming SNMP agent checks on host "Switch C": connection restored
 12776:20180111:095228.819 resuming SNMP agent checks on host "Switch D": connection restored
 12791:20180111:095228.855 resuming SNMP agent checks on host "Switch E": connection restored
 12809:20180111:095235.014 resuming SNMP agent checks on host "S3-S4": connection restored
 12364:20180111:095243.280 SNMP agent item "1.3.6.1.4.1.2636.3.3.1.1.6.[524]" on host "VC_PPD-1-1" failed: another network error, wait for 15 seconds
 12290:20180111:095251.680 SNMP agent item "ifOutQLen[8]" on host "Switch A" failed: first network error, wait for 15 seconds
 12786:20180111:095258.026 resuming SNMP agent checks on host "VC_PPD-1-1": connection restored
 12541:20180111:095301.495 SNMP agent item "ifOutErrors[XGigabitEthernet0/0/3]" on host "Switch D" failed: first network error, wait for 15 seconds
 12168:20180111:095304.997 SNMP agent item "ifOutErrors[GigabitEthernet0/0/16]" on host "Switch E" failed: first network error, wait for 15 seconds
 12285:20180111:095305.622 SNMP agent item "ifOutErrors[GigabitEthernet0/0/18]" on host "Switch B" failed: first network error, wait for 15 seconds
 12655:20180111:095307.071 SNMP agent item "ifInErrors[NULL0]" on host "Switch C" failed: first network error, wait for 15 seconds
 11997:20180111:095315.611 cannot connect to proxy "PaaS proxy": cannot connect to [[185.33.38.196]:10051]: [110] Connection timed out
 12857:20180111:095324.800 cannot send list of active checks to "172.18.81.140": host [compute02] not found
 12066:20180111:095327.146 SNMP agent item "ifOutOctets.[pimd]" on host "VC_PPD-1-1" failed: first network error, wait for 15 seconds
 12818:20180111:095327.222 resuming SNMP agent checks on host "Switch A": connection restored
 12777:20180111:095330.514 resuming SNMP agent checks on host "Switch D": connection restored
 12782:20180111:095330.521 resuming SNMP agent checks on host "Switch B": connection restored
 12805:20180111:095330.521 resuming SNMP agent checks on host "Switch E": connection restored
 12793:20180111:095330.526 resuming SNMP agent checks on host "Switch C": connection restored
 12862:20180111:095333.203 cannot send list of active checks to "172.18.20.111": host [redmine] not found
 12850:20180111:095339.912 cannot send list of active checks to "172.18.81.159": host [network01] not found
 12850:20180111:095341.517 cannot send list of active checks to "172.18.81.162": host [lb02] not found
 11999:20180111:095341.981 sending configuration data to proxy "C4C proxy" at "185.33.38.74", datalen 146458
 12847:20180111:095345.367 cannot send list of active checks to "172.18.81.124": host [sql02] not found
 12856:20180111:095345.693 cannot send list of active checks to "172.18.20.112": host [vm06] not found
 12861:20180111:095349.260 cannot send list of active checks to "172.18.20.72": host [horizon] not found
 12817:20180111:095349.385 resuming SNMP agent checks on host "VC_PPD-1-1": connection restored

What is annoying about those:
1. I don't really have access to hosts (and ofc agents installed there) in lines about "cannot send list of active checks" and those aren't hosts I'm planning to monitor for now.
2. PaaS proxy and all hosts connected through this proxy are disabled so I don't get why it's still getting checked.
3. Only those switches and ocasionally other hosts from last hosts page are having connection problems and, by looking at log file, it happens basically every minute.

I appreciate any advice and help. Thanks in advance.

Ad Widget

100% busy poller processes + cannot send list of active checks

100% busy poller processes + cannot send list of active checks