Hi,
since our upgrade from Zabbix 3.4.4 to 3.4.6 we experience a large amount of "..Agent unreachable.." waves. Last night we had alone > 400 Mails with unreachable messages coming in from 02:03h to 03:30h.
I was able to check some of the mentioned hosts during the "Mailstorm" with ping/fping and agent.ping - all of the hosts were up and reachable. Checked the switch ports also - no errors on the interface and no bandwith problem.
The only thing we changed was the upgrade to 3.4.6.
Out setup:
Host: Bare Metal Thomas Krenn Server, 4x Xeon E5-2403, 12GB RAM
OS: FreeBSD 11.1 with ZFS Root, 2 (SATA Disks in Mirror :/)
DB: PostgreSQL 9.5.10, Datafiles on ZFS
Zabbix: 3.4.6,compiled from Ports Collection
some changed config options:
Included zabbix internal process busy graph. Checked queuesizes, memory swap, etc.
The only possible problem i see is that the preprocessing process has to much busy % - and the Server specs may be too low.
Aside from that, i don't have a clue where to start looking.
Any thoughts ?
Thanks in advance,
Gerald
since our upgrade from Zabbix 3.4.4 to 3.4.6 we experience a large amount of "..Agent unreachable.." waves. Last night we had alone > 400 Mails with unreachable messages coming in from 02:03h to 03:30h.
I was able to check some of the mentioned hosts during the "Mailstorm" with ping/fping and agent.ping - all of the hosts were up and reachable. Checked the switch ports also - no errors on the interface and no bandwith problem.
The only thing we changed was the upgrade to 3.4.6.
Out setup:
Host: Bare Metal Thomas Krenn Server, 4x Xeon E5-2403, 12GB RAM
OS: FreeBSD 11.1 with ZFS Root, 2 (SATA Disks in Mirror :/)
DB: PostgreSQL 9.5.10, Datafiles on ZFS
Zabbix: 3.4.6,compiled from Ports Collection
some changed config options:
Code:
StartPollers=25 StartIPMIPollers=4 StartPreprocessors=4 StartPollersUnreachable=10 StartTrappers=4 StartPingers=20 StartDiscoverers=2 StartHTTPPollers=6 StartTimers=5 StartEscalators=4 StartAlerters=6 StartJavaPollers=4 StartVMwareCollectors=10 StartDBSyncers=6 CacheSize=64M HistoryCacheSize=32M HistoryIndexCacheSize=8M TrendCacheSize=16M ValueCacheSize=128M
The only possible problem i see is that the preprocessing process has to much busy % - and the Server specs may be too low.
Aside from that, i don't have a clue where to start looking.
Any thoughts ?
Thanks in advance,
Gerald
Comment