The zabbix-server process on our master server takes about 4 minutes to fully shut down. in zabbix-server.log I can see that most of this time is spent on operations for syncing trend data. Has anyone seen a similar shutdown time and have you done anything to reduce that? I've run a strace on the main zabbix-server process during that time and I mostly see select and insert DB operations into the trends_uint table.
A quick summary of our environment: 8 proxy servers across multiple data centers feed into a single master, which connects to a postgres database. We're monitoring about 7000 hosts with NVPS around 3000. We keep 1 year of trend data. In zabbix-server.conf I've set TrendCacheSize=256M. I've looked into the trend write cache graphs and I can see it's pretty stable in using about 150MB of the cache (% free is usually around 45%). At one point we had a higher value for TrendCacheSize and I noticed the time it takes to sync the trends data also increases (and thus shutdown time also was longer) with a larger value. (At one point during testing someone had set to to 1GB and it took 12 minutes to sync the trend cache, even though usage was still at around 150MB.) So in theory I have some wiggle room to reduce the value further but I want to keep at least 30% free. Ideally I'd like the process to stop pretty quickly, i.e. within 15 seconds. But with an environment at our scale I'm not sure what would be a reasonable shutdown time for zabbix-server. But 4 minutes still seems excessive. I appreciate any feedback.
A quick summary of our environment: 8 proxy servers across multiple data centers feed into a single master, which connects to a postgres database. We're monitoring about 7000 hosts with NVPS around 3000. We keep 1 year of trend data. In zabbix-server.conf I've set TrendCacheSize=256M. I've looked into the trend write cache graphs and I can see it's pretty stable in using about 150MB of the cache (% free is usually around 45%). At one point we had a higher value for TrendCacheSize and I noticed the time it takes to sync the trends data also increases (and thus shutdown time also was longer) with a larger value. (At one point during testing someone had set to to 1GB and it took 12 minutes to sync the trend cache, even though usage was still at around 150MB.) So in theory I have some wiggle room to reduce the value further but I want to keep at least 30% free. Ideally I'd like the process to stop pretty quickly, i.e. within 15 seconds. But with an environment at our scale I'm not sure what would be a reasonable shutdown time for zabbix-server. But 4 minutes still seems excessive. I appreciate any feedback.
Comment