Hi all,
I inherited a Zabbix 5.4.4 with Postgres 13 as the database backend. The former admin who has left the company did not setup auto vacuum or a regularly scheduled vacuum job. So regardless of any retention/expiry policy set in Zabbix, the database just grows and grows until it filled the virtual disk and Postgres crashed.
I analysed the problem, found the first cause - full disk - and then tried to clean up the installation. Since this is a virtual environment, enlarging the volume hosting the database was easy. I then tried to vacuum the database which failed for the "zabbix" DB. I did not copy the error message, unfortunately, and I am reluctant to try it again just now, because it will break my installation. I used "vacuumdb -a -f" for that. The error messages were about some invalid content "" for a certain field. I can of course reproduce the problem during off-hours if necessary. This left the DB in some inconsistent state and Zabbix stopped working. Complains about a missing "database version" right when there should be a login dialog.
OK. The former admin did setup regular backups that look like this:
Create new database, import schema, import data - no worky. Postgres complains about a ton of foreign key constraint violations. So I have a backup that I cannot restore in a single simple operation. Is this a Zabbix provided tool that creates these? Seems to run via some systemd thingy named "zabbix-server-backup.timer".
Additionally we have daily snapshots of the underlying BTRFS, even on a second machine. So I was able to restore the complete Postgres DB in binary fashion from the snapshot and restart everything after I enlarged the volume. So Zabbix is running again, phew.
The problem I now face is that I need to clean up this mess to ensure continuing operation. I am much more experienced with MySQL than Postgres and honestly flabbergasted there is no equivalent to mysqlcheck to repair an internal database structure or that there is a dump tool producing backups that cannot be re-imported.
What do you suggest how I go about restoring everything to a consistent state? I already talked to the teams and we need NO historical data. All we really need is the complete configuration.
Thanks for all comments, pointers to docs equally welcome, of course.
Kind regards,
Patrick
I inherited a Zabbix 5.4.4 with Postgres 13 as the database backend. The former admin who has left the company did not setup auto vacuum or a regularly scheduled vacuum job. So regardless of any retention/expiry policy set in Zabbix, the database just grows and grows until it filled the virtual disk and Postgres crashed.
I analysed the problem, found the first cause - full disk - and then tried to clean up the installation. Since this is a virtual environment, enlarging the volume hosting the database was easy. I then tried to vacuum the database which failed for the "zabbix" DB. I did not copy the error message, unfortunately, and I am reluctant to try it again just now, because it will break my installation. I used "vacuumdb -a -f" for that. The error messages were about some invalid content "" for a certain field. I can of course reproduce the problem during off-hours if necessary. This left the DB in some inconsistent state and Zabbix stopped working. Complains about a missing "database version" right when there should be a login dialog.
OK. The former admin did setup regular backups that look like this:
Code:
root@monitor:/var/zabbix-server-backup# ll [...] -rw-r--r-- 1 root root 6470805 Jun 19 05:44 zabbix-server_2022-06-19T05:44:02+02:00_data.sql.gz -rw-r--r-- 1 root root 20958 Jun 19 05:44 zabbix-server_2022-06-19T05:44:02+02:00_schema.sql.gz -rw-r--r-- 1 root root 6472745 Jun 20 05:44 zabbix-server_2022-06-20T05:44:02+02:00_data.sql.gz -rw-r--r-- 1 root root 20958 Jun 20 05:44 zabbix-server_2022-06-20T05:44:02+02:00_schema.sql.gz
Additionally we have daily snapshots of the underlying BTRFS, even on a second machine. So I was able to restore the complete Postgres DB in binary fashion from the snapshot and restart everything after I enlarged the volume. So Zabbix is running again, phew.
The problem I now face is that I need to clean up this mess to ensure continuing operation. I am much more experienced with MySQL than Postgres and honestly flabbergasted there is no equivalent to mysqlcheck to repair an internal database structure or that there is a dump tool producing backups that cannot be re-imported.
What do you suggest how I go about restoring everything to a consistent state? I already talked to the teams and we need NO historical data. All we really need is the complete configuration.
Thanks for all comments, pointers to docs equally welcome, of course.
Kind regards,
Patrick

Comment