Today we had Zabbix trigger this problem: "/var/lib/pgsql: Disk space is low (used > 80%)" on the Zabbix DB server.
My first thought was: "Guess I'll have to extend the disk then" Seems legit, since we haven't extended the DB disk at all, since the system was installed around the start of June 2020.
Upon first inspection, something seems off:
Disk usage graph since install date till now.
Zooming in, the sharp incline starts around the 15th of Feb.
Looking at the angle of increase, it looks similar to the previous days, but the data reduction that happens around noon every day up till this point, doesn't seem to kick in.
(The data reduction is handled by Zabbix utilizing the TimescaleDB plugin in PostgreSQL)
I've been digging around trying to find the reason. The regular culprits: increased VPS and influx of new items, gave nothing. Leaving me to suspect it's indeed something to do with lack of data reduction.
If I'd have to guess, it looks like it's not dropping the expired history and trend table partitions.
VPS:
Items:
Also verified it indeed is the Zabbix DB that's consuming the disk space:
Neither Zabbix server nor PostgreSQL logs show anything of value in regards to this issue.
The only thing I've found so far is a change in behavior of these two items: "Bgwriter: Buffers written directly by a backend per second" and "DB zbxdb: Tuples inserted per second"
But have so far failed to figure out why it has changed. Unfortunately I know nothing about PostgreSQL in particular, or databases in general.
Last time the zabbix server was updated was on the 23rd of Dec last year:
rpm -qa -last |grep zabbix
zabbix-server-pgsql-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:02 PM CET
zabbix-sender-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:02 PM CET
zabbix-agent-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:02 PM CET
zabbix-web-pgsql-5.0.19-1.el8.noarch Thu 23 Dec 2021 07:46:01 PM CET
zabbix-web-deps-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:01 PM CET
zabbix-web-5.0.19-1.el8.noarch Thu 23 Dec 2021 07:46:01 PM CET
zabbix-nginx-conf-5.0.19-1.el8.noarch Thu 23 Dec 2021 07:46:01 PM CET
And the DB on the 30th of Sep same year:
rpm -qa -last
postgresql11-server-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:24 AM CEST
postgresql11-devel-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:23 AM CEST
postgresql11-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:22 AM CEST
postgresql11-libs-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:21 AM CEST
zabbix-agent-5.0.16-1.el8.x86_64 Thu 30 Sep 2021 09:36:01 AM CEST
timescaledb-postgresql-11-1.7.5-0.el7.x86_64 Wed 07 Jul 2021 03:35:55 PM CEST
Last time the servers was rebooted is ~172 days ago.
And the zabbix-server service hasn't been restarted since it was updated on the 23rd of Dec.
systemctl status zabbix-server
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-12-23 19:46:03 CET; 2 months 26 days ago
Both servers are CentOS 8
We run Zabbix server 5.0.19
And PostgreSQL 11 with TimescaleDB v.1.7.5
Any pointers for troubleshooting this issue further will be much appreciated.
Thanks in advance.
My first thought was: "Guess I'll have to extend the disk then" Seems legit, since we haven't extended the DB disk at all, since the system was installed around the start of June 2020.
Upon first inspection, something seems off:
Disk usage graph since install date till now.
Zooming in, the sharp incline starts around the 15th of Feb.
Looking at the angle of increase, it looks similar to the previous days, but the data reduction that happens around noon every day up till this point, doesn't seem to kick in.
(The data reduction is handled by Zabbix utilizing the TimescaleDB plugin in PostgreSQL)
I've been digging around trying to find the reason. The regular culprits: increased VPS and influx of new items, gave nothing. Leaving me to suspect it's indeed something to do with lack of data reduction.
If I'd have to guess, it looks like it's not dropping the expired history and trend table partitions.
VPS:
Items:
Also verified it indeed is the Zabbix DB that's consuming the disk space:
Neither Zabbix server nor PostgreSQL logs show anything of value in regards to this issue.
The only thing I've found so far is a change in behavior of these two items: "Bgwriter: Buffers written directly by a backend per second" and "DB zbxdb: Tuples inserted per second"
But have so far failed to figure out why it has changed. Unfortunately I know nothing about PostgreSQL in particular, or databases in general.
Last time the zabbix server was updated was on the 23rd of Dec last year:
rpm -qa -last |grep zabbix
zabbix-server-pgsql-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:02 PM CET
zabbix-sender-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:02 PM CET
zabbix-agent-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:02 PM CET
zabbix-web-pgsql-5.0.19-1.el8.noarch Thu 23 Dec 2021 07:46:01 PM CET
zabbix-web-deps-5.0.19-1.el8.x86_64 Thu 23 Dec 2021 07:46:01 PM CET
zabbix-web-5.0.19-1.el8.noarch Thu 23 Dec 2021 07:46:01 PM CET
zabbix-nginx-conf-5.0.19-1.el8.noarch Thu 23 Dec 2021 07:46:01 PM CET
And the DB on the 30th of Sep same year:
rpm -qa -last
postgresql11-server-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:24 AM CEST
postgresql11-devel-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:23 AM CEST
postgresql11-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:22 AM CEST
postgresql11-libs-11.13-1PGDG.rhel8.x86_64 Thu 30 Sep 2021 09:36:21 AM CEST
zabbix-agent-5.0.16-1.el8.x86_64 Thu 30 Sep 2021 09:36:01 AM CEST
timescaledb-postgresql-11-1.7.5-0.el7.x86_64 Wed 07 Jul 2021 03:35:55 PM CEST
Last time the servers was rebooted is ~172 days ago.
And the zabbix-server service hasn't been restarted since it was updated on the 23rd of Dec.
systemctl status zabbix-server
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-12-23 19:46:03 CET; 2 months 26 days ago
Both servers are CentOS 8
We run Zabbix server 5.0.19
And PostgreSQL 11 with TimescaleDB v.1.7.5
Any pointers for troubleshooting this issue further will be much appreciated.
Thanks in advance.