Hi All,
Hoping to get some help in troubleshooting what might be happening and get some tips on things to look at, because been doing a lot of head scratching on this, now going to the point. Our zabbix platform has been having some performance issues lately(as per picture in attach) bringing the hole Zabbix server eventually to a halt with mysqld process on the server hitting 100% constantly for few hours. After some investigation I was able to find the following, we are currently using a AWS server instance with the DB on the server(GP2 different volume/300 iops if it matters) also we are currently using the following on the config file:
LogFile=/var/log/zabbix/zabbix_server.log
PidFile=/var/run/zabbix/zabbix_server.pid
LogFileSize=0
DebugLevel=3
DBHost=127.0.0.1
DBName=zabbix
DBUser=zabbix
DBPassword=XXXXXX
DBSocket=/var/run/mysqld/mysqld.sock
DBPort=3306
StartPollers=200
StartPollersUnreachable=60
StartTrappers=50
Timeout=30
CacheSize=32M
HistoryCacheSize=8M
TrendCacheSize=8M
#HistoryTextCacheSize=8M
ValueCacheSize=2048M
# Run House Keeping every 8 hours.
HousekeepingFrequency=8
UnreachablePeriod=120
UnavailableDelay=60
UnreachableDelay=30
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
# SSHKeyLocation=
# LogSlowQueries=0
# TmpDir=/tmp
Also noticed that when zabbix is having these crisis, there is also a flood of queued events(as per picture attached).
Glad if I could get any help on this.
Hoping to get some help in troubleshooting what might be happening and get some tips on things to look at, because been doing a lot of head scratching on this, now going to the point. Our zabbix platform has been having some performance issues lately(as per picture in attach) bringing the hole Zabbix server eventually to a halt with mysqld process on the server hitting 100% constantly for few hours. After some investigation I was able to find the following, we are currently using a AWS server instance with the DB on the server(GP2 different volume/300 iops if it matters) also we are currently using the following on the config file:
LogFile=/var/log/zabbix/zabbix_server.log
PidFile=/var/run/zabbix/zabbix_server.pid
LogFileSize=0
DebugLevel=3
DBHost=127.0.0.1
DBName=zabbix
DBUser=zabbix
DBPassword=XXXXXX
DBSocket=/var/run/mysqld/mysqld.sock
DBPort=3306
StartPollers=200
StartPollersUnreachable=60
StartTrappers=50
Timeout=30
CacheSize=32M
HistoryCacheSize=8M
TrendCacheSize=8M
#HistoryTextCacheSize=8M
ValueCacheSize=2048M
# Run House Keeping every 8 hours.
HousekeepingFrequency=8
UnreachablePeriod=120
UnavailableDelay=60
UnreachableDelay=30
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
# SSHKeyLocation=
# LogSlowQueries=0
# TmpDir=/tmp
Also noticed that when zabbix is having these crisis, there is also a flood of queued events(as per picture attached).
Glad if I could get any help on this.