Hi everyone,
I recently took over a Zabbix 3.0LTS infrastructure running on CentOS 7 and Postgres 9.x (with partitions). Frontend, Zabbix Server and Postgres are all in distinct VMs. We have about 1500 hosts, 95k items and 63k triggers. Trend data is kept for 12 months. Database is about 250GB on disk.
We plan to upgrade to Zabbix 5.0LTS on Ubuntu 20.04; we cannot go yet to 6.0LTS because remote proxies are not supported on the OS side. We set up a new environment and we are currently doing dry-runs of upgrade to make sure everything works. The database upgrade step is the one that takes most of the time for the upgrade; essentially because we need to keep the history/trends. Unfortunately, we don't have access to SSDs to maximize disk IO but we improved upgrade by adjusting Postgresql configuration; upgrade takes 2.5 hours but we'd like to improve further to reduce downtime further.
We analyzed the server log during the upgrade and there are 2 steps that are more time consuming.
One way I'm looking at it (to be tested):
Regards,
Sylvain
I recently took over a Zabbix 3.0LTS infrastructure running on CentOS 7 and Postgres 9.x (with partitions). Frontend, Zabbix Server and Postgres are all in distinct VMs. We have about 1500 hosts, 95k items and 63k triggers. Trend data is kept for 12 months. Database is about 250GB on disk.
We plan to upgrade to Zabbix 5.0LTS on Ubuntu 20.04; we cannot go yet to 6.0LTS because remote proxies are not supported on the OS side. We set up a new environment and we are currently doing dry-runs of upgrade to make sure everything works. The database upgrade step is the one that takes most of the time for the upgrade; essentially because we need to keep the history/trends. Unfortunately, we don't have access to SSDs to maximize disk IO but we improved upgrade by adjusting Postgresql configuration; upgrade takes 2.5 hours but we'd like to improve further to reduce downtime further.
We analyzed the server log during the upgrade and there are 2 steps that are more time consuming.
- Step at 3% mark (60 mins): we see slow queries like this "select source,object,objectid,eventid,value from events where eventid>607307518 and source in (0,3) order by eventid limit 10000"
- Step at 16% mark (90 mins): we see slow queries like this "update alerts set p_eventid=786671521 where eventid=786678760;"
One way I'm looking at it (to be tested):
- Few days BEFORE migration take a backup of the DB and upgrade it on the new instance.
- On the migration day, take another backup of configuration and history/trends/alerts since the last backup.
- Would Zabbix detect the upgrade task to be done and ONLY update the latest entries or redo the full tables?
Regards,
Sylvain
Comment