Ad Widget

Collapse

Very High Number of unreachable host

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mellis
    Senior Member
    • Oct 2017
    • 145

    #1

    Very High Number of unreachable host

    Good Morning

    We have started having a very high number of unreachable triggers in our system. This started after we had a network device issue that disconnected 6 of our proxies out of the 42 proxies we run. We fixed the network device and the 6 proxies we connected. About 6 hours later the unreachable triggers jump very high.

    I have run some queries as an example:
    July 8th to 9th we had 525
    Sept 5th to Sept 6th we had 30840

    We disabled all host, ~3700 and rebooted the Zabbix server and added 2 cores from 4 to 6 on Sept 7th
    Sept 7th to Sept 8th we had ~21024

    Screen shoots are attach as queue.doc

    OK let me back up a bit and describe my system. We are running a Zabbix 4.4.10 on three systems. I have a web server, a Zabbix server and a database server. before the event we had ~3700 host across 42 proxies. These datacenters are across the US in physical data centers and cloud systems.

    The web server is a 4 core, 16gb VM, the Zabbix Server is a 6 core 20gb VM, and the database server is a 16core 128gb VM. The database is large, 778GB.

    When i look at the host system i notice that the Zabbix server has what i did not expect, a bunch of disk IO, attached as disk IO.doc.

    Also attached is the system stats. stats.doc

    My Zabbix server config is:


    # This is a configuration file for Zabbix server daemon
    # To get more information about Zabbix, visit http://www.zabbix.com

    ############ GENERAL PARAMETERS #################


    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=16
    DebugLevel=4
    PidFile=/var/run/zabbix/zabbix_server.pid
    SocketDir=/var/run/zabbix
    DBHost=10.96.110.44
    DBName=zabbix
    DBUser=zabbix
    DBPassword=Z@bB1x123456
    DBPort=3306
    ############ ADVANCED PARAMETERS ################
    StartPollers=30
    # StartIPMIPollers=0
    StartPreprocessors=8
    StartPollersUnreachable=2
    StartTrappers=160
    StartPingers=8
    StartDiscoverers=36
    StartHTTPPollers=8
    StartTimers=6
    # StartEscalators=1
    StartAlerters=18
    # HousekeepingFrequency=1
    # MaxHousekeeperDelete=10000
    CacheSize=2048M
    CacheUpdateFrequency=120
    StartDBSyncers=6
    HistoryCacheSize=2G
    HistoryIndexCacheSize=1024M
    TrendCacheSize=512M
    ValueCacheSize=512M
    Timeout=30
    # TrapperTimeout=300
    # UnreachablePeriod=45
    # UnavailableDelay=60
    # UnreachableDelay=15
    AlertScriptsPath=/usr/lib/zabbix/alertscripts
    ExternalScripts=/usr/lib/zabbix/externalscripts
    # FpingLocation=/usr/sbin/fping




    My question is, Is it normal that we have this high IO on the Zabbix server?



    Attached Files
  • mellis
    Senior Member
    • Oct 2017
    • 145

    #2
    I have adjusted the Start processes down and restarted the zabbix server,,,, after about 30 to 45 mins the high volume of unreachable alerts return, we are getting over 25,000 per day. Lowering the start processes did lower the disk IO on the server host some,,, More information.
    I do an agent ping every 15min and have the trigger setup that the agent ping nodata set at 30mins

    This problem started last Fri at 5:00pm,,,, i have asked high and low if there was a change external to the Zabbix host, but no one will fess up.

    Comment

    Working...