Recently we've moved from a single zabbix server (was 2.4.7) running on Centos 6 to a front-end/backend scenario. The backend is the original centos 6 server and just has the postgres database on it. The tables are partitioned and we keep 30 days worth of data and the database is just under 300GB
For the front-end I've built a new Centos 7 server and have got it all up and running and everything seemed fine. However, after a few days we get periods where the data just stops coming in for everything other than the zabbix server itself. The 2 attached graphs are for proxies both in the same DC and remote where the data stops. Data also stops coming in from servers on the same LAN as the zabbix server and that doesn't go through any routers/firewalls.
I've found that if I log onto the zabbix server when the data isn't coming in and restart the zabbix server then everything starts checking in again, which implies it's the zabbix_server itself could be the problem. I've gone through the server log and cannot see anything out of the ordinary in there. On the database server there doesn't seem to be anything happening on that that can explain it either. No vaccuuming on the tables that I can see at that time (auto-vaccuuming is on).
I've just moved up to 2.4.8 to see if this resolves the issue, but it hasn't seemed to have made a difference.
Server config file is below. The server itself has 12GB of RAM assigned. All of our agents/proxies are active.
ListenPort=10052
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBHost=X.X.X.X
DBName=zabbix
DBUser=dbuser
DBPassword=Password
StartPollers=10
StartPollersUnreachable=5
StartTrappers=60
StartDiscoverers=10
SNMPTrapperFile=/tmp/zabbix_traps.tmp
StartSNMPTrapper=1
CacheSize=128M
CacheUpdateFrequency=300
StartDBSyncers=4
HistoryCacheSize=128M
TrendCacheSize=32M
ValueCacheSize=64M
Timeout=30
LogSlowQueries=3000
StartProxyPollers=0
I've also put a graph of the internal processes on zabbix which shows the history syncer process dropping to nothing during the outage.
I'm out of ideas for what it could be. Any suggestions of things I've missed or where to look?
For the front-end I've built a new Centos 7 server and have got it all up and running and everything seemed fine. However, after a few days we get periods where the data just stops coming in for everything other than the zabbix server itself. The 2 attached graphs are for proxies both in the same DC and remote where the data stops. Data also stops coming in from servers on the same LAN as the zabbix server and that doesn't go through any routers/firewalls.
I've found that if I log onto the zabbix server when the data isn't coming in and restart the zabbix server then everything starts checking in again, which implies it's the zabbix_server itself could be the problem. I've gone through the server log and cannot see anything out of the ordinary in there. On the database server there doesn't seem to be anything happening on that that can explain it either. No vaccuuming on the tables that I can see at that time (auto-vaccuuming is on).
I've just moved up to 2.4.8 to see if this resolves the issue, but it hasn't seemed to have made a difference.
Server config file is below. The server itself has 12GB of RAM assigned. All of our agents/proxies are active.
ListenPort=10052
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBHost=X.X.X.X
DBName=zabbix
DBUser=dbuser
DBPassword=Password
StartPollers=10
StartPollersUnreachable=5
StartTrappers=60
StartDiscoverers=10
SNMPTrapperFile=/tmp/zabbix_traps.tmp
StartSNMPTrapper=1
CacheSize=128M
CacheUpdateFrequency=300
StartDBSyncers=4
HistoryCacheSize=128M
TrendCacheSize=32M
ValueCacheSize=64M
Timeout=30
LogSlowQueries=3000
StartProxyPollers=0
I've also put a graph of the internal processes on zabbix which shows the history syncer process dropping to nothing during the outage.
I'm out of ideas for what it could be. Any suggestions of things I've missed or where to look?

Comment