Ad Widget

Collapse

Planning High-Availability PostgreSQL database backend with Pacemaker and DRBD

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Spectator
    Member
    • Sep 2021
    • 71

    #1

    Planning High-Availability PostgreSQL database backend with Pacemaker and DRBD

    First of all sorry for my bad english.

    I've seen Pacemaker cluster designs where the currently active node uses a shared storage. I want this shared storage to be high available as well - avoiding expensive hardware storages and their licenses.
    So I thought the DRBD design would be right for that. Each cluster node would contain a separate disk to keep their data in sync with DRBD, which would be the PostgreSQL database (with TimeScaleDB). The two nodes will operate in two separate data centers. For this reason, it would be important for database data to be available in both datacenters if one of the datacenters becomes unavailable. I think DRBD is a good solution for this.
    The cluster would also include a virtual IP address that would be used by the active node. Zabbix will connect to the database through this virtual IP address.
    Zabbix servers would also work in failover mode: I plan to use Zabbix 6 with his own native active-passive HA solution.
    (The two nodes in my test environment are currently running in Virtualbox, but will be permanently released in a VMWare environment, including a quorum device and fencing.)

    I think this environment design could be useful for many other Zabbix operators in the future.

    It would look like this:

    Click image for larger version  Name:	PostgreSQL_DRBD_Cluster.png Views:	0 Size:	13.3 KB ID:	438906


    I found a Zabbix High-Avaliability design with PostgreSQL here:
    https://assets.zabbix.com/files/even...tsu_SRAOSS.pdf

    But I find the Pacemaker cluster settings to be superficial. To avoid brain-splitting, I think it would be better for Pacemaker to handle DRBD as well, as in the following solution, here:


    In this description, the design and timing of the cluster resources seems to be well developed. In this description, Pacemaker also treats DRBD, which I think will help prevent any possible brain-split.


    I performed these operations on the two cluster nodes as a basis for the https://www.postgresql.vn/blog/pg_pacemaker_drbd description:

    Cluster nodes in Virtualbox:

    node1 / 192.168.14.31 / +1 1GB HDD (sdb)
    node2 / 192.168.14.32 / +1 1GB HDD (sdb)

    192.168.14.30 - Cluster VIP address

    The /etc/hosts file on both nodes:

    192.168.14.31 node1
    192.168.14.32 node2

    On both nodes:
    Chronyd enabled (for NTP)
    Firewalld and SELinux is disabled

    ============= PostgreSQL install =============

    I installed the PostgreSQL 13 from repository:
    On both nodes:
    dnf -y install https://download.postgresql.org/pub/...est.noarch.rpm
    dnf -qy module disable postgresql
    dnf search postgresql | grep server
    dnf -y install postgresql13 postgresql13-server


    ============= DRBD install =============

    I installed the DRBD 9 from the ELREPO repoository:

    On both nodes:
    dnf install -y elrepo-release
    dnf install -y kmod-drbd90 drbd90-utils drbd90-utils-sysvinit

    ====== Creating 512MB LVM on the 1GB sdb ======
    On both nodes:
    fdisk /dev/sdb
    n, p, 1, enter, enter, w

    lsblk:
    sdb 8:16 0 1G 0 disk
    └─sdb1 8:17 0 1023M 0 part

    pvcreate /dev/sdb1
    vgcreate vg /dev/sdb1
    lvcreate --size 512MB --name lv vg

    lsblk:
    sdb 8:16 0 1G 0 disk
    └─sdb1 8:17 0 1023M 0 part
    └─vg-lv 253:3 0 512M 0 lvm

    ========= Creating DRBD on the LVM volume =========

    mcedit /etc/drbd.d/pgres.res
    resource pgres {
    protocol B;
    device minor 1;
    meta-disk internal;
    disk /dev/mapper/vg-lv;
    on node1 {
    address 192.168.14.31:7001;
    }
    on node2 {
    address 192.168.14.32:7001;
    }
    }

    mcedit /etc/drbd.d/global_common.conf
    usage-count no;

    drbdadm create-md pgres
    systemctl start drbd
    drbdadm status pgres # red inconsistent, this is normal at this point

    On node1:
    drbdadm --force primary pgres
    drbdadm status pgres # green UpToDate
    mkfs.xfs /dev/drbd1 -L PGDATA -f

    On both nodes:
    dnf config-manager --set-enabled ha
    yum install -y pacemaker pcs fence-agents-all psmisc

    echo "hacluster" | sudo passwd hacluster --stdin

    On node1:
    pcs host auth node1 node2 -u hacluster -p hacluster
    pcs cluster setup firstcluster node1 node2 --force --start

    crm_mon -Arf1 # All nodes are online
    pcs cluster enable --all
    pcs property set stonith-enabled=false
    pcs property set cluster-recheck-interval=10m
    pcs property set no-quorum-policy=ignore
    pcs resource create drbd-fisrtcluster ocf:linbit:drbd drbd_resource=pgres op monitor interval=15s op start timeout=60 op stop timeout=60 op promote timeout=90
    pcs resource promotable drbd-fisrtcluster master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
    pcs resource cleanup
    crm_mon -Arf1
    mkdir /pgdata
    pcs resource create pgFS-firstcluster ocf:heartbeat:Filesystem device="/dev/drbd1" directory=/pgdata fstype=xfs options=defaults,noatime,nodiratime,attr2 op start timeout=60 op stop timeout=60 --group masterg-firstcluster

    pcs status
    Error:
    Failed Resource Actions:
    * pgFS-firstcluster_start_0 on node1 'error' (1): call=23, status='complete', exitreason='Couldn't mount device [/dev/drbd1] as /pgdata', last-rc-change='2022-01-29 19:29:22 +01:00', queued=0ms, exec=2781ms

    chown postgres /pgdata/ -R
    df -h /pgdata
    pcs resource cleanup
    pcs status # problem resolved, the filesystem started on the node2
    su - postgres
    /usr/pgsql-13/bin/initdb -D /pgdata/pg13data -E utf8 --no-locale
    back to the root user
    pcs resource create pgsql-firstcluster ocf:heartbeatgsql pgctl=/usr/pgsql-13/bin/pg_ctl psql=/usr/pgsql-13/bin/psql pgdata=/pgdata/pg13data pgport=5432 stop_escalate=1 op start timeout=60s interval=0s on-fail=restart op monitor timeout=60s interval=7s on-fail=restart op monitor timeout=60s interval=2s on-fail=restart role=Master op promote timeout=60s interval=0s on-fail=restart op demote timeout=60s interval=0s on-fail=stop op stop timeout=60s interval=0s on-fail=block op notify timeout=60s interval=0s --group masterg-firstcluster
    crm_mon -Arf1

    pcs status - got error:
    pgsql-firstcluster_start_0 on node2 'not installed' (5): call=40, status='complete', exitreason='Configuration file /pgdata/pg13data/postgresql.conf doesn't exist', last-rc-change='2022-01-29 19:42:00 +01:00', queued=0ms, exec=187ms
    * pgFS-firstcluster_start_0 on node1 'error' (1): call=41, status='complete', exitreason='Couldn't mount device [/dev/drbd1] as /pgdata', last-rc-change='2022-01-29 19:42:01 +01:00', queued=0ms, exec=2776ms

    pcs node2 standby # filesystem resource please come back to the node1
    pcs resource cleanup
    pcs status
    error:
    Migration Summary:
    * Node: node1:
    * pgsql-firstcluster: migration-threshold=1 fail-count=1000000 last-failure='Sat Jan 29 19:42:28 2022'

    Failed Resource Actions:
    * pgsql-firstcluster_start_0 on node1 'not installed' (5): call=58, status='complete', exitreason='Configuration file /pgdata/pg13data/postgresql.conf doesn't exist', last-rc-change='2022-01-29 19:42:28 +01:00', queued=0ms, exec=314ms

    [root@node1 ~]# ls -la /pgdata/
    total 0
    drwxr-xr-x 2 root root 6 Jan 29 19:16 .
    dr-xr-xr-x. 18 root root 258 Jan 29 19:27 ..
    [root@node1 ~]#

    goto back to the pgdata folder creation and again:
    chown postgres /pgdata/ -R
    su - postgres
    /usr/pgsql-13/bin/initdb -D /pgdata/pg13data -E utf8 --no-locale

    back to the root user
    pcs resouurce cleanup
    pcs status # ok everything on node1
    [root@node1 ~]# ps -ef | grep postgres: | grep -v grep
    postgres 7584 7578 0 19:45 ? 00:00:00 postgres: logger
    postgres 7596 7578 0 19:45 ? 00:00:00 postgres: checkpointer
    postgres 7597 7578 0 19:45 ? 00:00:00 postgres: background writer
    postgres 7598 7578 0 19:45 ? 00:00:00 postgres: walwriter
    postgres 7599 7578 0 19:45 ? 00:00:00 postgres: autovacuum launcher
    postgres 7600 7578 0 19:45 ? 00:00:00 postgres: stats collector
    postgres 7601 7578 0 19:45 ? 00:00:00 postgres: logical replication launcher
    [root@node1 ~]#

    pcs resource create vip-firstcluster ocf:heartbeat:IPaddr2 ip=192.168.14.30 cidr_netmask=24 op start timeout=60s interval=0s on-fail=restart op monitor timeout=120s interval=20s on-fail=restart op stop timeout=60s interval=0s on-fail=block --group masterg-firstcluster
    ip a | grep -w inet # shown the cluster VIP address also

    pcs constraint colocation add started masterg-firstcluster with Master drbd-fisrtcluster-clone INFINITY
    pcs constraint order promote drbd-fisrtcluster-clone then start masterg-firstcluster symmetrical=false score=INFINITY

    On node2:
    mkdir /pgdata
    chown postgres /pgdata/ -R
    pcs node unstandby node2

    pcs status # Everything is fine on node1
    [root@node1 ~]# pcs status
    Cluster name: firstcluster
    Cluster Summary:
    * Stack: corosync
    * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
    * Last updated: Sat Jan 29 21:12:41 2022
    * Last change: Sat Jan 29 21:12:22 2022 by root via cibadmin on node1
    * 2 nodes configured
    * 5 resource instances configured

    Node List:
    * Online: [ node1 node2 ]

    Full List of Resources:
    * Clone Set: drbd-fisrtcluster-clone [drbd-fisrtcluster] (promotable):
    * Masters: [ node1 ]
    * Slaves: [ node2 ]
    * Resource Group: masterg-firstcluster:
    * pgFS-firstcluster (ocf::heartbeat:Filesystem): Started node1
    * pgsql-firstcluster (ocf::heartbeatgsql): Started node1
    * vip-firstcluster (ocf::heartbeat:IPaddr2): Started node1

    Daemon Status:
    corosync: active/enabled
    pacemaker: active/enabled
    pcsd: active/enabled
    [root@node1 ~]#


    It's time to test.
    I migrated the resources of the cluster to and from the currently active node stanbdy to and from the other node with the following commands:
    pcs node standby node1
    pcs node unstandby node1
    pcs node standby node2
    pcs node unstandby node2
    Then I started all over again...

    However, unfortunately, about 3 out of 5 times when I want to migrate cluster resources from one node to another, the "drbd-firstcluster" resource remains in the "Demoting" state during the migration (I think until the set 90-second timeout expires):

    Click image for larger version  Name:	DRBD_error_during_migration.jpg Views:	0 Size:	268.9 KB ID:	438907


    After that, the resources starts on the other node, but the cluster gives the following error:

    Click image for larger version  Name:	DRBD_error_after_migration.jpg Views:	0 Size:	295.4 KB ID:	438908

    There must be some small flaws in the design, but unfortunately I don’t know what. Can you advise me on what to change to stop getting stuck in "Demoting"?

    I also tried PostrgreSQL version 14 and TimeScaleDB version 2, which together could provide a fairly robust database background for Zabbix, but the same result. I think the problem may be in the DRBD configuration, possibly in the Pacemaker configuration.

    I am confident that with your help we can find the problem and develop a definitive step-by-step description for a low-cost yet reliable active-passive high-availability PostgreSQL cluster that you may find useful.


    Last edited by Spectator; 31-01-2022, 13:00.
  • splitek
    Senior Member
    • Dec 2018
    • 101

    #2
    But what is the goal?
    1) have volume (disk) in HA mode
    2) have PostgreSQL database in HA mode
    For point two, here is some table with comarision on some well know solutions for PostgreSQL: https://www.postgresql.org/docs/13/d...ABILITY-MATRIX
    In my opinion if someone want to bulid "budget" HA for PG then it should use streaming replication. Set up is fast and easy. DB's can be on completely different filesystems. Want another replica? - no problem. Want to remove? - also no problem.
    If someone also need automated failover then of course must use/configure other tools (some tools are in table in the link above).
    And what are the steps in DRBD to add a third device? What if I want to move my DB to a new / other larger volume? The ROI (return on investment) is definitely not on the DRBD side, especially for the "budget" HA.




    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4806

      #3
      Just one solution: pg_auto_failover for keeping DB in sync and switch over, if needed. Haproxy and keepalived (also HA solution) in front of it to handle directing queries to active instance.

      Comment

      Working...