Postgresql

Postgresql

Available solutions




Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/postgresql


Template DB PostgreSQL

Overview

Templates to monitor PostgreSQL by Zabbix.
This template was tested on Zabbix 4.2.1 and PostgreSQL vesions 9.6, 10 and 11 on Linux and Windows.

Setup

  1. Install Zabbix agent and create a read-only zbx_monitor user with proper access to your PostgreSQL server.

    For PostgreSQL version 10 and above:

    CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
    GRANT pg_monitor TO zbx_monitor;
    

    For older PostgreSQL versions:

    CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>';
    GRANT SELECT ON pg_stat_database TO zbx_monitor;
    
  2. Copy postgresql/ to Zabbix agent home directory /var/lib/zabbix/. The postgresql/ directory contains the files needed to obtain metrics from PostgreSQL.

  3. Copy template_db_postgresql.conf to Zabbix agent configuration directory /etc/zabbix/zabbix_agentd.d/ and restart Zabbix agent service.

  4. Edit pg_hba.conf to allow connections from Zabbix agent https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.

    Add rows (for example):

    host all zbx_monitor 127.0.0.1/32 trust
    host all zbx_monitor 0.0.0.0/0 md5
    host all zbx_monitor ::0/0 md5
    
  5. If you need to monitor the remote server then create .pgpass file in Zabbix agent home directory /var/lib/zabbix/ and add the connection details with the instance, port, database, user and password information in the below format https://www.postgresql.org/docs/current/libpq-pgpass.html.

    Example 1:

    <REMOTE_HOST1>:5432:postgres:zbx_monitor:<PASSWORD>
    
    <REMOTE_HOST2>:5432:postgres:zbx_monitor:<PASSWORD>
    ...
    <REMOTE_HOSTN>:5432:postgres:zbx_monitor:<PASSWORD>
    

    Example 2:

    *:5432:postgres:zbx_monitor:<PASSWORD>
    
  6. Import template_db_postgresql.xml to Zabbix and link it to the target host

Zabbix configuration

If PostgreSQL is installed from the PGDG repository, then add the path to pg_isready to the PATH environment variable for zabbix user.

Macros used

MacroDescriptionDefault
{$PG.HOST}Database server host or socket directory127.0.0.1
{$PG.PORT}Database server port5432
{$PG.USER}Database user namezbx_monitor
{$PG.DB}Database name to connect to the serverpostgres
{$PG.LLD.FILTER.DBNAME}Regular expression for filtering names of discovered databases(.*)
{$PG.CHECKPOINTS_REQ.MAX.WARN}Requested checkpoints threshold for trigger expression5
{$PG.PING_TIME.MAX.WARN}Maximum ping time for trigger expression1s
{$PG.CACHE_HITRATIO.MIN.WARN}Minimum cache hit ratio for trigger expression90
{$PG.CONN_TOTAL_PCT.MAX.WARN}Maximum number of open connections for trigger expression90
{$PG.CONN_WAIT.MAX.WARN}Maximum number of waiting connections for trigger expression0
{$PG.CONN_IDLE_IN_TRANS.MAX.WARN}Maximum number of 'idle in transaction' connections for trigger expression5
{$PG.DEADLOCKS.MAX.WARN}Maximum number of deadlocks for trigger expression0
{$PG.CONFLICTS.MAX.WARN}Maximum number of recovery conflicts for trigger expression0
{$PG.REPL_LAG.MAX.WARN}Maximum replication lag for trigger expression10m
{$PG.TRANS_ACTIVE.MAX.WARN}Maximum active transaction time for trigger expression30s
{$PG.TRANS_IDLE.MAX.WARN}Maximum 'idle in transaction' connection time for trigger expression30s
{$PG.TRANS_WAIT.MAX.WARN}Maximum waiting transaction time for trigger expression30s
{$PG.LOCKS.MAX.WARN}Maximum number of locks for trigger expression100
{$PG.QUERY_ETIME.MAX.WARN}Maximum query execution time in seconds30
{$PG.SLOW_QUERIES.MAX.WARN}Maximum number of slow queries for trigger expression5
{$PG.FROZENXID_PCT_STOP.MIN.HIGH}Minimum percentage of frozen XID75

Template links

There are no template links in this template.

Discovery rules

NameDescriptionType
Databases discoveryUse the macro {$PG.LLD.FILTER.DBNAME} to filter the discovered databasesZabbix agent

Items collected

NameDescriptionType
BgwriterStatistics about the background writer process's activityZabbix agent
Bgwriter: Checkpoint write timeTotal amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in millisecondsDependent item
Bgwriter: Buffers backend fsyncNumber of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write)Dependent item
Bgwriter: Checkpoint sync timeTotal amount of time that has been spent in the portion of checkpoint processing where files are synchronized to diskDependent item
Bgwriter: Requested checkpointsNumber of requested checkpoints that have been performedDependent item
Bgwriter: Max writtenNumber of times the background writer stopped a cleaning scan because it had written too many buffersDependent item
Bgwriter: Scheduled checkpointsNumber of scheduled checkpoints that have been performedDependent item
Bgwriter: Buffers written during checkpointsNumber of buffers written during checkpointsDependent item
Bgwriter: Buffers written directly by a backendNumber of buffers written directly by a backendDependent item
Bgwriter: Buffers written by the background writerNumber of buffers written by the background writerDependent item
Bgwriter: Buffers allocatedNumber of buffers allocatedDependent item
Status: VersionZabbix agent
Status: PingZabbix agent
Status: Ping timeZabbix agent
Status: UptimeZabbix agent
Status: Cache hit ratio %Cache hit ratioZabbix agent
Status: DB { #DBNAME}: SizeDatabase sizeZabbix agent
Connections sumCollect all metrics from pg_stat_activity https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEWZabbix agent
Connections sum: Total %Total number of connections in percentageDependent item
Connections sum: TotalTotal number of connectionsDependent item
Connections sum: WaitingTotal number of waiting connections https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLEDependent item
Connections sum: Idle in transactionTotal number of connections in a transaction state, but not executing a queryDependent item
Connections sum: IdleTotal number of connections waiting for a new client commandDependent item
Connections sum: ActiveTotal number of connections executing a queryDependent item
Connections sum: PreparedTotal number of prepared transactions https://www.postgresql.org/docs/current/sql-prepare-transaction.htmlDependent item
DbstatCollect all metrics from pg_stat_database per database https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEWZabbix agent
DB { #DBNAME}: Commits per secondNumber of transactions in this database that have been committedDependent item
DB { #DBNAME}: Tuples updated per secondTotal number of rows updated by queries in this databaseDependent item
DB { #DBNAME}: Tuples returned per secondTotal number of rows updated by queries in this databaseDependent item
DB { #DBNAME}: Tuples inserted per secondTotal number of rows inserted by queries in this databaseDependent item
DB { #DBNAME}: Tuples fetched per secondTotal number of rows fetched by queries in this databaseDependent item
DB { #DBNAME}: Tuples deleted per secondTotal number of rows deleted by queries in this databaseDependent item
DB { #DBNAME}: Temp_files created per secondTotal number of temporary files created by queries in this databaseDependent item
DB { #DBNAME}: Temp_bytes written per secondTotal amount of data written to temporary files by queries in this databaseDependent item
DB { #DBNAME}: Rollbacks per secondTotal number of transactions in this database that have been rolled backDependent item
DB { #DBNAME}: Detected deadlocks per secondTotal number of detected deadlocks in this databaseDependent item
DB { #DBNAME}: Detected conflicts per secondTotal number of queries canceled due to conflicts with recovery in this databaseDependent item
DB { #DBNAME}: Disk blocks read per secondTotal number of disk blocks read in this databaseDependent item
DB { #DBNAME}: Blocks hit per secondTotal number of times disk blocks were found already in the buffer cache, so that a read was not necessaryDependent item
Replication: standby countNumber of standby serversZabbix agent
Replication: recovery roleReplication role: 1 — recovery is still in progress (standby mode), 0 — master mode.Zabbix agent
Replication: statusReplication status: 0 — streaming is down, 1 — streaming is up, 2 — master modeZabbix agent
Replication: lag in secondsReplication lag with Master in secondsZabbix agent
TransactionsCollect metrics by transaction execution timeZabbix agent
Transactions: Max active transaction timeCurrent max active transaction timeDependent item
Transactions: Max idle transaction timeCurrent max idle transaction timeDependent item
Transactions: Max prepared transaction timeCurrent max prepared transaction timeDependent item
Transactions: Max waiting transaction timeCurrent max waiting transaction timeDependent item
Status: Config hashPostgreSQL configuration hashZabbix agent
WALMaster item to collect WAL metricsZabbix agent
WAL: Bytes written per secondWAL write in bytesDependent item
WAL: Segments countNumber of WAL segmentsDependent item
LocksCollect all metrics from pg_locks per databasehttps://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLESZabbix agent
DB { #DBNAME} locks: TotalTotal number of locks in the databaseDependent item
DB { #DBNAME} scansNumber of scans done for table/index in the databaseZabbix agent
DB { #DBNAME} scans: IndexNumber of index scans in the databaseDependent item
DB { #DBNAME} scans: SequentialNumber of sequential scans in the databaseDependent item
QueriesCollect all metrics by query execution timeZabbix agent
DB { #DBNAME} queries: Max maintenance timeMax maintenance query timeDependent item
DB { #DBNAME} queries: Max query timeMax query timeDependent item
DB { #DBNAME} queries: Max transaction timeMax transaction query timeDependent item
DB { #DBNAME} queries: Sum maintenance timeSum maintenance query timeDependent item
DB { #DBNAME} queries: Sum query timeSum query timeDependent item
DB { #DBNAME} queries: Sum transaction timeSum transaction query timeDependent item
DB { #DBNAME} queries: Slow maintenance countSlow maintenance query countDependent item
DB { #DBNAME} queries: Slow query countSlow query query countDependent item
DB { #DBNAME} queries: Slow transaction countSlow transaction query countDependent item
DB { #DBNAME} frozen XIDZabbix agent
DB { #DBNAME} frozen XID: before stop %Preventing Transaction ID Wraparound Failures https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUNDDependent item
DB { #DBNAME} frozen XID: before avtovacuum %Preventing Transaction ID Wraparound Failures https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUNDDependent item

Triggers

NameSeverityExpressionDescriptionDependencies
PostgreSQL: Required checkpoints occurs too frequently (over {$PG.CHECKPOINTS_REQ.MAX.WARN})Averagepgsql.bgwriter.checkpoints_req.last() > {$PG.CHECKPOINTS_REQ.MAX.WARN}Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. https://www.postgresql.org/docs/current/wal-configuration.html
PostgreSQL: Response too long (over {$PG.PING_TIME.MAX.WARN})Averagepgsql.ping.time.min(5m) > {$PG.PING_TIME.MAX.WARN}PostgreSQL: Service is down
PostgreSQL: Service is downHighpgsql.ping.last() = 0
PostgreSQL: Failed to get items (no data for 30m)Warningpgsql.bgwriter.nodata(30m) = 1Zabbix has not received data for items for the last 30 minutesPostgreSQL: Service is down
PostgreSQL: Service has been restarted (uptime < 10m)Infopgsql.uptime.last() < 10mPostgreSQL uptime is less than 10 minutes
PostgreSQL: Cache hit ratio too low (under {$PG.CACHE_HITRATIO.MIN.WARN} in 5m)Warningpgsql.cache.hit.max(5m) < {$PG.CACHE_HITRATIO.MIN.WARN}
PostgreSQL: Total number of connections is too high (over {$PG.CONN_TOTAL_PCT.MAX.WARN} in 5m)Averagepgsql.connections.sum.total_pct.min(5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN}
Database { #DBNAME}: Deadlock occurred (over {$PG.DEADLOCKS.MAX.WARN:"{ #DBNAME}"} in 5m)Highpgsql.dbstat.deadlocks["{ #DBNAME}"].min(5m) > {$PG.DEADLOCKS.MAX.WARN:"{ #DBNAME}"}
PostgreSQL: Too many recovery conflicts (over {$PG.CONFLICTS.MAX.WARN} in 5m)Averagepgsql.dbstat.sum.conflicts.min(5m) > {$PG.CONFLICTS.MAX.WARN}The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT
PostgreSQL: Streaming lag is too high (over {$PG.REPL_LAG.MAX.WARN} in 5m)Averagepgsql.streaming.lag.sec.min(5m) > {$PG.REPL_LAG.MAX.WARN}
PostgreSQL: Configuration has changedInfopgsql.config.diff() = 1 and pgsql.config.strlen() > 0
PostgreSQL: Version has changed (new version value received: {ITEM.VALUE})Infopgsql.version.diff() = 1 and pgsql.version.strlen() > 0
Database { #DBNAME}: Number of locks is too high (over {$PG.LOCKS.MAX.WARN:"{ #DBNAME}"} in 5m)Warningpgsql.locks.total["{ #DBNAME}"].min(5m) > {$PG.LOCKS.MAX.WARN:"{ #DBNAME}"}
Database { #DBNAME}: Too many slow queries (over {$PG.SLOW_QUERIES.MAX.WARN:"{ #DBNAME}"} in 5m)Warningpgsql.queries.query.slow_count["{ #DBNAME}"].min(5m) > {$PG.SLOW_QUERIES.MAX.WARN:"{ #DBNAME}"}
Database { #DBNAME}: VACUUM FREEZE is required to prevent wraparound (frozen XID less then {$PG.FROZENXID_PCT_STOP.MIN.HIGH:"{ #DBNAME}"} %)Averagepgsql.db.frozenxid_prc.before_stop["{ #DBNAME}"].last() < {$PG.FROZENXID_PCT_STOP.MIN.HIGH:"{ #DBNAME}"}
PostgreSQL: Replication is downAveragepgsql.replication.status.max(5m) = 0 

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

References

Add your solution