Dear Zabbix Users,
we are having a problem with our Zabbix database. The log file of the postgresql db shows
The error appears when the cron job is beeing run which does a pg_dump of the Zabbix DB to make a regular backup.
I have searched around and found people saying that this error normally occurs if you have hardware problems. In this case Zabbix runs as a virtual machine (kvm) and gets it's disks as raw image files from the hypervisor. I checked the file system within the vm (xfs) and it had no problems. I also checked the underlying file system of the hypervisor (zfs) which is ok, too. A weekly scrub job runs there. No errors.
I tried to fix the problem by doing zeroing out the broken pages in postgresql.
These commands corrected the problems. Afterwards I was able to do a clean pg_dump again.
Then after a few days the problem reoccured. I did the vacuum full commands again and I also created a new disk image for the vm, formatted it with ext4 and copied over the db file system. I also used new mount options (nobarrier, noatime).
Now, after another few days, the problem is here again. Maybe someone has a hint how to solve this problem?
It's a postgresql db version 11 on a CentOS7 system. (repo from postgresl.org)
/var/lib/pgsql is on a separate file system ext4, 200GB with 63 GB used
Zabbix Server 4.0.7 (repo from zabbix.com)
Hypervisor: CentOS7, Disk image space is a ZFS pool, raidz1 with 6 SSDs.
96 GB ECC RAM is beeing used in the hypervisor.
Cheers and many thanks
Timo
we are having a problem with our Zabbix database. The log file of the postgresql db shows
Code:
2019-04-26 23:14:38.697 CEST [29676] ERROR: invalid page in block 1211023 of relation base/16386/81870 2019-04-26 23:14:38.697 CEST [29676] STATEMENT: COPY public.history_uint (itemid, clock, value, ns) TO stdout;
I have searched around and found people saying that this error normally occurs if you have hardware problems. In this case Zabbix runs as a virtual machine (kvm) and gets it's disks as raw image files from the hypervisor. I checked the file system within the vm (xfs) and it had no problems. I also checked the underlying file system of the hypervisor (zfs) which is ok, too. A weekly scrub job runs there. No errors.
I tried to fix the problem by doing zeroing out the broken pages in postgresql.
Code:
psql zabbix SET zero_damaged_pages = on; VACUUM FULL VERBOSE ANALYZE history_uint; REINDEX TABLE history_uint;
Then after a few days the problem reoccured. I did the vacuum full commands again and I also created a new disk image for the vm, formatted it with ext4 and copied over the db file system. I also used new mount options (nobarrier, noatime).
Now, after another few days, the problem is here again. Maybe someone has a hint how to solve this problem?
It's a postgresql db version 11 on a CentOS7 system. (repo from postgresl.org)
/var/lib/pgsql is on a separate file system ext4, 200GB with 63 GB used
Zabbix Server 4.0.7 (repo from zabbix.com)
Hypervisor: CentOS7, Disk image space is a ZFS pool, raidz1 with 6 SSDs.
96 GB ECC RAM is beeing used in the hypervisor.
Cheers and many thanks
Timo
Comment