Very High Number of unreachable host

mellis

Senior Member

Joined: Oct 2017

Posts: 145
#1

Very High Number of unreachable host

09-09-2021, 15:43

Good Morning

We have started having a very high number of unreachable triggers in our system. This started after we had a network device issue that disconnected 6 of our proxies out of the 42 proxies we run. We fixed the network device and the 6 proxies we connected. About 6 hours later the unreachable triggers jump very high.

I have run some queries as an example:
July 8th to 9th we had 525
Sept 5th to Sept 6th we had 30840

We disabled all host, ~3700 and rebooted the Zabbix server and added 2 cores from 4 to 6 on Sept 7th
Sept 7th to Sept 8th we had ~21024

Screen shoots are attach as queue.doc

OK let me back up a bit and describe my system. We are running a Zabbix 4.4.10 on three systems. I have a web server, a Zabbix server and a database server. before the event we had ~3700 host across 42 proxies. These datacenters are across the US in physical data centers and cloud systems.

The web server is a 4 core, 16gb VM, the Zabbix Server is a 6 core 20gb VM, and the database server is a 16core 128gb VM. The database is large, 778GB.

When i look at the host system i notice that the Zabbix server has what i did not expect, a bunch of disk IO, attached as disk IO.doc.

Also attached is the system stats. stats.doc

My Zabbix server config is:

# This is a configuration file for Zabbix server daemon
# To get more information about Zabbix, visit http://www.zabbix.com

############ GENERAL PARAMETERS #################

LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=16
DebugLevel=4
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=10.96.110.44
DBName=zabbix
DBUser=zabbix
DBPassword=Z@bB1x123456
DBPort=3306
############ ADVANCED PARAMETERS ################
StartPollers=30
# StartIPMIPollers=0
StartPreprocessors=8
StartPollersUnreachable=2
StartTrappers=160
StartPingers=8
StartDiscoverers=36
StartHTTPPollers=8
StartTimers=6
# StartEscalators=1
StartAlerters=18
# HousekeepingFrequency=1
# MaxHousekeeperDelete=10000
CacheSize=2048M
CacheUpdateFrequency=120
StartDBSyncers=6
HistoryCacheSize=2G
HistoryIndexCacheSize=1024M
TrendCacheSize=512M
ValueCacheSize=512M
Timeout=30
# TrapperTimeout=300
# UnreachablePeriod=45
# UnavailableDelay=60
# UnreachableDelay=15
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
# FpingLocation=/usr/sbin/fping

My question is, Is it normal that we have this high IO on the Zabbix server?

Attached Files

queue.docx (83.2 KB, 1 view)

io.docx (145.2 KB, 1 view)

stats.docx (35.1 KB, 1 view)
Tags: None
mellis

Senior Member

Joined: Oct 2017

Posts: 145
#2

10-09-2021, 18:41

I have adjusted the Start processes down and restarted the zabbix server,,,, after about 30 to 45 mins the high volume of unreachable alerts return, we are getting over 25,000 per day. Lowering the start processes did lower the disk IO on the server host some,,, More information.
I do an agent ping every 15min and have the trigger setup that the agent ping nodata set at 30mins

This problem started last Fri at 5:00pm,,,, i have asked high and low if there was a change external to the Zabbix host, but no one will fess up.
Comment

Ad Widget

Very High Number of unreachable host

Very High Number of unreachable host

Comment