First of all sorry for my bad english.
I've seen Pacemaker cluster designs where the currently active node uses a shared storage. I want this shared storage to be high available as well - avoiding expensive hardware storages and their licenses.
So I thought the DRBD design would be right for that. Each cluster node would contain a separate disk to keep their data in sync with DRBD, which would be the PostgreSQL database (with TimeScaleDB). The two nodes will operate in two separate data centers. For this reason, it would be important for database data to be available in both datacenters if one of the datacenters becomes unavailable. I think DRBD is a good solution for this.
The cluster would also include a virtual IP address that would be used by the active node. Zabbix will connect to the database through this virtual IP address.
Zabbix servers would also work in failover mode: I plan to use Zabbix 6 with his own native active-passive HA solution.
(The two nodes in my test environment are currently running in Virtualbox, but will be permanently released in a VMWare environment, including a quorum device and fencing.)
I think this environment design could be useful for many other Zabbix operators in the future.
It would look like this:

I found a Zabbix High-Avaliability design with PostgreSQL here:
https://assets.zabbix.com/files/even...tsu_SRAOSS.pdf
But I find the Pacemaker cluster settings to be superficial. To avoid brain-splitting, I think it would be better for Pacemaker to handle DRBD as well, as in the following solution, here:
In this description, the design and timing of the cluster resources seems to be well developed. In this description, Pacemaker also treats DRBD, which I think will help prevent any possible brain-split.
I performed these operations on the two cluster nodes as a basis for the https://www.postgresql.vn/blog/pg_pacemaker_drbd description:
Cluster nodes in Virtualbox:
node1 / 192.168.14.31 / +1 1GB HDD (sdb)
node2 / 192.168.14.32 / +1 1GB HDD (sdb)
192.168.14.30 - Cluster VIP address
The /etc/hosts file on both nodes:
192.168.14.31 node1
192.168.14.32 node2
On both nodes:
Chronyd enabled (for NTP)
Firewalld and SELinux is disabled
============= PostgreSQL install =============
I installed the PostgreSQL 13 from repository:
On both nodes:
dnf -y install https://download.postgresql.org/pub/...est.noarch.rpm
dnf -qy module disable postgresql
dnf search postgresql | grep server
dnf -y install postgresql13 postgresql13-server
============= DRBD install =============
I installed the DRBD 9 from the ELREPO repoository:
On both nodes:
dnf install -y elrepo-release
dnf install -y kmod-drbd90 drbd90-utils drbd90-utils-sysvinit
====== Creating 512MB LVM on the 1GB sdb ======
On both nodes:
fdisk /dev/sdb
n, p, 1, enter, enter, w
lsblk:
sdb 8:16 0 1G 0 disk
└─sdb1 8:17 0 1023M 0 part
pvcreate /dev/sdb1
vgcreate vg /dev/sdb1
lvcreate --size 512MB --name lv vg
lsblk:
sdb 8:16 0 1G 0 disk
└─sdb1 8:17 0 1023M 0 part
└─vg-lv 253:3 0 512M 0 lvm
========= Creating DRBD on the LVM volume =========
mcedit /etc/drbd.d/pgres.res
resource pgres {
protocol B;
device minor 1;
meta-disk internal;
disk /dev/mapper/vg-lv;
on node1 {
address 192.168.14.31:7001;
}
on node2 {
address 192.168.14.32:7001;
}
}
mcedit /etc/drbd.d/global_common.conf
usage-count no;
drbdadm create-md pgres
systemctl start drbd
drbdadm status pgres # red inconsistent, this is normal at this point
On node1:
drbdadm --force primary pgres
drbdadm status pgres # green UpToDate
mkfs.xfs /dev/drbd1 -L PGDATA -f
On both nodes:
dnf config-manager --set-enabled ha
yum install -y pacemaker pcs fence-agents-all psmisc
echo "hacluster" | sudo passwd hacluster --stdin
On node1:
pcs host auth node1 node2 -u hacluster -p hacluster
pcs cluster setup firstcluster node1 node2 --force --start
crm_mon -Arf1 # All nodes are online
pcs cluster enable --all
pcs property set stonith-enabled=false
pcs property set cluster-recheck-interval=10m
pcs property set no-quorum-policy=ignore
pcs resource create drbd-fisrtcluster ocf:linbit:drbd drbd_resource=pgres op monitor interval=15s op start timeout=60 op stop timeout=60 op promote timeout=90
pcs resource promotable drbd-fisrtcluster master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs resource cleanup
crm_mon -Arf1
mkdir /pgdata
pcs resource create pgFS-firstcluster ocf:heartbeat:Filesystem device="/dev/drbd1" directory=/pgdata fstype=xfs options=defaults,noatime,nodiratime,attr2 op start timeout=60 op stop timeout=60 --group masterg-firstcluster
pcs status
Error:
Failed Resource Actions:
* pgFS-firstcluster_start_0 on node1 'error' (1): call=23, status='complete', exitreason='Couldn't mount device [/dev/drbd1] as /pgdata', last-rc-change='2022-01-29 19:29:22 +01:00', queued=0ms, exec=2781ms
chown postgres /pgdata/ -R
df -h /pgdata
pcs resource cleanup
pcs status # problem resolved, the filesystem started on the node2
su - postgres
/usr/pgsql-13/bin/initdb -D /pgdata/pg13data -E utf8 --no-locale
back to the root user
pcs resource create pgsql-firstcluster ocf:heartbeat
gsql pgctl=/usr/pgsql-13/bin/pg_ctl psql=/usr/pgsql-13/bin/psql pgdata=/pgdata/pg13data pgport=5432 stop_escalate=1 op start timeout=60s interval=0s on-fail=restart op monitor timeout=60s interval=7s on-fail=restart op monitor timeout=60s interval=2s on-fail=restart role=Master op promote timeout=60s interval=0s on-fail=restart op demote timeout=60s interval=0s on-fail=stop op stop timeout=60s interval=0s on-fail=block op notify timeout=60s interval=0s --group masterg-firstcluster
crm_mon -Arf1
pcs status - got error:
pgsql-firstcluster_start_0 on node2 'not installed' (5): call=40, status='complete', exitreason='Configuration file /pgdata/pg13data/postgresql.conf doesn't exist', last-rc-change='2022-01-29 19:42:00 +01:00', queued=0ms, exec=187ms
* pgFS-firstcluster_start_0 on node1 'error' (1): call=41, status='complete', exitreason='Couldn't mount device [/dev/drbd1] as /pgdata', last-rc-change='2022-01-29 19:42:01 +01:00', queued=0ms, exec=2776ms
pcs node2 standby # filesystem resource please come back to the node1
pcs resource cleanup
pcs status
error:
Migration Summary:
* Node: node1:
* pgsql-firstcluster: migration-threshold=1 fail-count=1000000 last-failure='Sat Jan 29 19:42:28 2022'
Failed Resource Actions:
* pgsql-firstcluster_start_0 on node1 'not installed' (5): call=58, status='complete', exitreason='Configuration file /pgdata/pg13data/postgresql.conf doesn't exist', last-rc-change='2022-01-29 19:42:28 +01:00', queued=0ms, exec=314ms
[root@node1 ~]# ls -la /pgdata/
total 0
drwxr-xr-x 2 root root 6 Jan 29 19:16 .
dr-xr-xr-x. 18 root root 258 Jan 29 19:27 ..
[root@node1 ~]#
goto back to the pgdata folder creation and again:
chown postgres /pgdata/ -R
su - postgres
/usr/pgsql-13/bin/initdb -D /pgdata/pg13data -E utf8 --no-locale
back to the root user
pcs resouurce cleanup
pcs status # ok everything on node1
[root@node1 ~]# ps -ef | grep postgres: | grep -v grep
postgres 7584 7578 0 19:45 ? 00:00:00 postgres: logger
postgres 7596 7578 0 19:45 ? 00:00:00 postgres: checkpointer
postgres 7597 7578 0 19:45 ? 00:00:00 postgres: background writer
postgres 7598 7578 0 19:45 ? 00:00:00 postgres: walwriter
postgres 7599 7578 0 19:45 ? 00:00:00 postgres: autovacuum launcher
postgres 7600 7578 0 19:45 ? 00:00:00 postgres: stats collector
postgres 7601 7578 0 19:45 ? 00:00:00 postgres: logical replication launcher
[root@node1 ~]#
pcs resource create vip-firstcluster ocf:heartbeat:IPaddr2 ip=192.168.14.30 cidr_netmask=24 op start timeout=60s interval=0s on-fail=restart op monitor timeout=120s interval=20s on-fail=restart op stop timeout=60s interval=0s on-fail=block --group masterg-firstcluster
ip a | grep -w inet # shown the cluster VIP address also
pcs constraint colocation add started masterg-firstcluster with Master drbd-fisrtcluster-clone INFINITY
pcs constraint order promote drbd-fisrtcluster-clone then start masterg-firstcluster symmetrical=false score=INFINITY
On node2:
mkdir /pgdata
chown postgres /pgdata/ -R
pcs node unstandby node2
pcs status # Everything is fine on node1
[root@node1 ~]# pcs status
Cluster name: firstcluster
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Sat Jan 29 21:12:41 2022
* Last change: Sat Jan 29 21:12:22 2022 by root via cibadmin on node1
* 2 nodes configured
* 5 resource instances configured
Node List:
* Online: [ node1 node2 ]
Full List of Resources:
* Clone Set: drbd-fisrtcluster-clone [drbd-fisrtcluster] (promotable):
* Masters: [ node1 ]
* Slaves: [ node2 ]
* Resource Group: masterg-firstcluster:
* pgFS-firstcluster (ocf::heartbeat:Filesystem): Started node1
* pgsql-firstcluster (ocf::heartbeat
gsql): Started node1
* vip-firstcluster (ocf::heartbeat:IPaddr2): Started node1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node1 ~]#
It's time to test.
I migrated the resources of the cluster to and from the currently active node stanbdy to and from the other node with the following commands:
pcs node standby node1
pcs node unstandby node1
pcs node standby node2
pcs node unstandby node2
Then I started all over again...
However, unfortunately, about 3 out of 5 times when I want to migrate cluster resources from one node to another, the "drbd-firstcluster" resource remains in the "Demoting" state during the migration (I think until the set 90-second timeout expires):

After that, the resources starts on the other node, but the cluster gives the following error:

There must be some small flaws in the design, but unfortunately I don’t know what. Can you advise me on what to change to stop getting stuck in "Demoting"?
I also tried PostrgreSQL version 14 and TimeScaleDB version 2, which together could provide a fairly robust database background for Zabbix, but the same result. I think the problem may be in the DRBD configuration, possibly in the Pacemaker configuration.
I am confident that with your help we can find the problem and develop a definitive step-by-step description for a low-cost yet reliable active-passive high-availability PostgreSQL cluster that you may find useful.
I've seen Pacemaker cluster designs where the currently active node uses a shared storage. I want this shared storage to be high available as well - avoiding expensive hardware storages and their licenses.
So I thought the DRBD design would be right for that. Each cluster node would contain a separate disk to keep their data in sync with DRBD, which would be the PostgreSQL database (with TimeScaleDB). The two nodes will operate in two separate data centers. For this reason, it would be important for database data to be available in both datacenters if one of the datacenters becomes unavailable. I think DRBD is a good solution for this.
The cluster would also include a virtual IP address that would be used by the active node. Zabbix will connect to the database through this virtual IP address.
Zabbix servers would also work in failover mode: I plan to use Zabbix 6 with his own native active-passive HA solution.
(The two nodes in my test environment are currently running in Virtualbox, but will be permanently released in a VMWare environment, including a quorum device and fencing.)
I think this environment design could be useful for many other Zabbix operators in the future.
It would look like this:
I found a Zabbix High-Avaliability design with PostgreSQL here:
https://assets.zabbix.com/files/even...tsu_SRAOSS.pdf
But I find the Pacemaker cluster settings to be superficial. To avoid brain-splitting, I think it would be better for Pacemaker to handle DRBD as well, as in the following solution, here:
In this description, the design and timing of the cluster resources seems to be well developed. In this description, Pacemaker also treats DRBD, which I think will help prevent any possible brain-split.
I performed these operations on the two cluster nodes as a basis for the https://www.postgresql.vn/blog/pg_pacemaker_drbd description:
Cluster nodes in Virtualbox:
node1 / 192.168.14.31 / +1 1GB HDD (sdb)
node2 / 192.168.14.32 / +1 1GB HDD (sdb)
192.168.14.30 - Cluster VIP address
The /etc/hosts file on both nodes:
192.168.14.31 node1
192.168.14.32 node2
On both nodes:
Chronyd enabled (for NTP)
Firewalld and SELinux is disabled
============= PostgreSQL install =============
I installed the PostgreSQL 13 from repository:
On both nodes:
dnf -y install https://download.postgresql.org/pub/...est.noarch.rpm
dnf -qy module disable postgresql
dnf search postgresql | grep server
dnf -y install postgresql13 postgresql13-server
============= DRBD install =============
I installed the DRBD 9 from the ELREPO repoository:
On both nodes:
dnf install -y elrepo-release
dnf install -y kmod-drbd90 drbd90-utils drbd90-utils-sysvinit
====== Creating 512MB LVM on the 1GB sdb ======
On both nodes:
fdisk /dev/sdb
n, p, 1, enter, enter, w
lsblk:
sdb 8:16 0 1G 0 disk
└─sdb1 8:17 0 1023M 0 part
pvcreate /dev/sdb1
vgcreate vg /dev/sdb1
lvcreate --size 512MB --name lv vg
lsblk:
sdb 8:16 0 1G 0 disk
└─sdb1 8:17 0 1023M 0 part
└─vg-lv 253:3 0 512M 0 lvm
========= Creating DRBD on the LVM volume =========
mcedit /etc/drbd.d/pgres.res
resource pgres {
protocol B;
device minor 1;
meta-disk internal;
disk /dev/mapper/vg-lv;
on node1 {
address 192.168.14.31:7001;
}
on node2 {
address 192.168.14.32:7001;
}
}
mcedit /etc/drbd.d/global_common.conf
usage-count no;
drbdadm create-md pgres
systemctl start drbd
drbdadm status pgres # red inconsistent, this is normal at this point
On node1:
drbdadm --force primary pgres
drbdadm status pgres # green UpToDate
mkfs.xfs /dev/drbd1 -L PGDATA -f
On both nodes:
dnf config-manager --set-enabled ha
yum install -y pacemaker pcs fence-agents-all psmisc
echo "hacluster" | sudo passwd hacluster --stdin
On node1:
pcs host auth node1 node2 -u hacluster -p hacluster
pcs cluster setup firstcluster node1 node2 --force --start
crm_mon -Arf1 # All nodes are online
pcs cluster enable --all
pcs property set stonith-enabled=false
pcs property set cluster-recheck-interval=10m
pcs property set no-quorum-policy=ignore
pcs resource create drbd-fisrtcluster ocf:linbit:drbd drbd_resource=pgres op monitor interval=15s op start timeout=60 op stop timeout=60 op promote timeout=90
pcs resource promotable drbd-fisrtcluster master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs resource cleanup
crm_mon -Arf1
mkdir /pgdata
pcs resource create pgFS-firstcluster ocf:heartbeat:Filesystem device="/dev/drbd1" directory=/pgdata fstype=xfs options=defaults,noatime,nodiratime,attr2 op start timeout=60 op stop timeout=60 --group masterg-firstcluster
pcs status
Error:
Failed Resource Actions:
* pgFS-firstcluster_start_0 on node1 'error' (1): call=23, status='complete', exitreason='Couldn't mount device [/dev/drbd1] as /pgdata', last-rc-change='2022-01-29 19:29:22 +01:00', queued=0ms, exec=2781ms
chown postgres /pgdata/ -R
df -h /pgdata
pcs resource cleanup
pcs status # problem resolved, the filesystem started on the node2
su - postgres
/usr/pgsql-13/bin/initdb -D /pgdata/pg13data -E utf8 --no-locale
back to the root user
pcs resource create pgsql-firstcluster ocf:heartbeat
gsql pgctl=/usr/pgsql-13/bin/pg_ctl psql=/usr/pgsql-13/bin/psql pgdata=/pgdata/pg13data pgport=5432 stop_escalate=1 op start timeout=60s interval=0s on-fail=restart op monitor timeout=60s interval=7s on-fail=restart op monitor timeout=60s interval=2s on-fail=restart role=Master op promote timeout=60s interval=0s on-fail=restart op demote timeout=60s interval=0s on-fail=stop op stop timeout=60s interval=0s on-fail=block op notify timeout=60s interval=0s --group masterg-firstclustercrm_mon -Arf1
pcs status - got error:
pgsql-firstcluster_start_0 on node2 'not installed' (5): call=40, status='complete', exitreason='Configuration file /pgdata/pg13data/postgresql.conf doesn't exist', last-rc-change='2022-01-29 19:42:00 +01:00', queued=0ms, exec=187ms
* pgFS-firstcluster_start_0 on node1 'error' (1): call=41, status='complete', exitreason='Couldn't mount device [/dev/drbd1] as /pgdata', last-rc-change='2022-01-29 19:42:01 +01:00', queued=0ms, exec=2776ms
pcs node2 standby # filesystem resource please come back to the node1
pcs resource cleanup
pcs status
error:
Migration Summary:
* Node: node1:
* pgsql-firstcluster: migration-threshold=1 fail-count=1000000 last-failure='Sat Jan 29 19:42:28 2022'
Failed Resource Actions:
* pgsql-firstcluster_start_0 on node1 'not installed' (5): call=58, status='complete', exitreason='Configuration file /pgdata/pg13data/postgresql.conf doesn't exist', last-rc-change='2022-01-29 19:42:28 +01:00', queued=0ms, exec=314ms
[root@node1 ~]# ls -la /pgdata/
total 0
drwxr-xr-x 2 root root 6 Jan 29 19:16 .
dr-xr-xr-x. 18 root root 258 Jan 29 19:27 ..
[root@node1 ~]#
goto back to the pgdata folder creation and again:
chown postgres /pgdata/ -R
su - postgres
/usr/pgsql-13/bin/initdb -D /pgdata/pg13data -E utf8 --no-locale
back to the root user
pcs resouurce cleanup
pcs status # ok everything on node1
[root@node1 ~]# ps -ef | grep postgres: | grep -v grep
postgres 7584 7578 0 19:45 ? 00:00:00 postgres: logger
postgres 7596 7578 0 19:45 ? 00:00:00 postgres: checkpointer
postgres 7597 7578 0 19:45 ? 00:00:00 postgres: background writer
postgres 7598 7578 0 19:45 ? 00:00:00 postgres: walwriter
postgres 7599 7578 0 19:45 ? 00:00:00 postgres: autovacuum launcher
postgres 7600 7578 0 19:45 ? 00:00:00 postgres: stats collector
postgres 7601 7578 0 19:45 ? 00:00:00 postgres: logical replication launcher
[root@node1 ~]#
pcs resource create vip-firstcluster ocf:heartbeat:IPaddr2 ip=192.168.14.30 cidr_netmask=24 op start timeout=60s interval=0s on-fail=restart op monitor timeout=120s interval=20s on-fail=restart op stop timeout=60s interval=0s on-fail=block --group masterg-firstcluster
ip a | grep -w inet # shown the cluster VIP address also
pcs constraint colocation add started masterg-firstcluster with Master drbd-fisrtcluster-clone INFINITY
pcs constraint order promote drbd-fisrtcluster-clone then start masterg-firstcluster symmetrical=false score=INFINITY
On node2:
mkdir /pgdata
chown postgres /pgdata/ -R
pcs node unstandby node2
pcs status # Everything is fine on node1
[root@node1 ~]# pcs status
Cluster name: firstcluster
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Sat Jan 29 21:12:41 2022
* Last change: Sat Jan 29 21:12:22 2022 by root via cibadmin on node1
* 2 nodes configured
* 5 resource instances configured
Node List:
* Online: [ node1 node2 ]
Full List of Resources:
* Clone Set: drbd-fisrtcluster-clone [drbd-fisrtcluster] (promotable):
* Masters: [ node1 ]
* Slaves: [ node2 ]
* Resource Group: masterg-firstcluster:
* pgFS-firstcluster (ocf::heartbeat:Filesystem): Started node1
* pgsql-firstcluster (ocf::heartbeat
gsql): Started node1* vip-firstcluster (ocf::heartbeat:IPaddr2): Started node1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node1 ~]#
It's time to test.
I migrated the resources of the cluster to and from the currently active node stanbdy to and from the other node with the following commands:
pcs node standby node1
pcs node unstandby node1
pcs node standby node2
pcs node unstandby node2
Then I started all over again...
However, unfortunately, about 3 out of 5 times when I want to migrate cluster resources from one node to another, the "drbd-firstcluster" resource remains in the "Demoting" state during the migration (I think until the set 90-second timeout expires):
After that, the resources starts on the other node, but the cluster gives the following error:
There must be some small flaws in the design, but unfortunately I don’t know what. Can you advise me on what to change to stop getting stuck in "Demoting"?
I also tried PostrgreSQL version 14 and TimeScaleDB version 2, which together could provide a fairly robust database background for Zabbix, but the same result. I think the problem may be in the DRBD configuration, possibly in the Pacemaker configuration.
I am confident that with your help we can find the problem and develop a definitive step-by-step description for a low-cost yet reliable active-passive high-availability PostgreSQL cluster that you may find useful.
Comment