Ad Widget

**kloczek** · 14-08-2019, 10:48

I will have a lot of comments to your announcement and I suppose that it will be long conversations about psql and mysql as well.
Nevertheless firs it would be good to know where is the git repo to clone it

(clone from https://git.zabbix.com/ seems does not work).

**AlexL** · 14-08-2019, 10:53

Hello,
It is available in master zabbix repo https://git.zabbix.com/scm/zbx/zabbix.git folder templates/db/postgresql/ and templates/db/mysql
Alternatively you can download just the template from BitBucket UI.

**kloczek** · 14-08-2019, 14:33

OK so lets start

Firs I think that is will be many sub subjects in this conversation so I'll try to keep that thread divided a bit.

Under mu current contract few days ago I've started my own PostgresSQL template (I must finish this part in ~month).
I see that you are heading almost in the same direction as I'm going from point of view of goals and exact approaches used techniques. Because those small differences I think that we may end up with two different templates. Despite tat fact I think that it is really worth to encircle few areas and/or exchange some thoughts and/or discuss few points :P

I've started my work form template https://raw.githubusercontent.com/ca...Server_3.0.xml. That template is pretty well organised but design of the queries is a bit over complicated which is causing that backend loadable module is as well quite complicated. Author additionally was forced to deal with lack of many zabbix modern techniques like master/dependent items.

So .. my goal are to:
1) assemble template which will be possible to use with set of UserParameter injected into the agent configuration (+maybe few backend scripts) and use with loadable module
2) collect as much or only raw data and do all filtering/processing/calculations in the template as now that processing is easy to offload outside monitored host (to the zabbix proxy)
3) I'm using only active monitoring (relatively small but still difference)
4) I'm using exact naming convention https://github.com/kloczek/zabbix-te...and-guidelines

Ad 1) it means that:
a) I'm assuming that my template as it will be/must be added on exact host and interface details will be hidden in UserParameter params or loadable module configuration. As consequence exposing things like hostname of the monitored host.
In your template is item which is using "proc.num[{$PG_PROCESS_NAME}]" which obviously will be not working if monitored postgres will be running not on the host where is running zabbix agent. Second macro {$PG.HOST} can be dropped as well because it must be always used on "localhost".

Ad 2) it means that:
a) avoid by any costs using SQL procedures, join and other more complicated SQL queries features
b) any filtering/calculations/processing should not be done in in the SQL queries.
First example which violates my assumption is discovery of the databases (from pgsql.discovery.db.sql):

Code:

WITH T AS (
    SELECT
        datname AS "{#DBNAME}"
    FROM pg_database
    WHERE
        NOT datistemplate
        AND datname != 'postgres'
)
SELECT '{"data":'|| regexp_replace(coalesce(json_agg(T), '[]'::json)::text, E'[\\n\\r\\s]+', '', 'g') || '}'
FROM T

Will stop here a bit longer to explain some hierarchy issues which I found.

Two days ago trying to organise my main loop to populate all monitoring objects I found that I'm not able to sample reliable all tables stats data as long as I'm sampling those data per database time to time are failing with timeouts

Simple time to time master item sampling all per database have been stopped and for next few minutes (IIRC up to even 10min) all exact table(s) queries freezes as query hangs. I'm using on the host which I'm using as PosgreSQL monitoring playground my "Service Zabbix Agent" template which samples zabbix[host,,items_unsupported] key so I'm able even to see fluctuating number of unsupported items caused by this effect.
So .. yesterdays I've started diagnosing this issue and I found than it is some locking issue of the queries against "postgres" system database.
Simple time to tile zabbix agent on executing in parallel some queries against "postgres" database been hitting data already locked by some other query.
I think that I'm hitting that issue because I'm using database which hosting multiple databases and as it is used by developers in last few days to obtain some on non-prod maximum performance which is possible to squeeze from taht SQL engine

This exactly caused that I've started thinking about simplest possible to use SQL queries to shorten time of the queries. As result I've added 2a) point "avoid by any costs using SQL procedures, join and other more complicated SQL queries features"

I would be not able to discover that issue if like you I'll have hardcoded in the main select "datname != 'postgres'"
So my current main iterator item listing all databases is:

Code:

UserParameter=service.pgsql.discovery.db,/usr/bin/psql -U monitoring -qAtX postgres -c "SELECT json_agg(row_to_json(row)) FROM (SELECT datname, datallowconn, datistemplate FROM pg_catalog.pg_database ORDER BY 1) row;"

(I really don't care that this line is long and less readable is it saves few syscalls to not open additional file

)
That query obtains raw list of databases with all params allowing to make decision on filtering to skip monitoring some tables or to include even "postgres" db to obtain some self monitoring data (analogue of the SNMPv2 MIB data on SNMP area

)

Things which I'm going to investigate today are:
1) check is it any possibility to do selects against "postgres" database without locking tables/database. I remember that before MySQL 5.x it was parameter which allowed to not lock system tables on the cost of precision of the returned data in queries results. Maybe it is possible to apply some workaround for that issue?
2) if it will be not possible to find any workaround I'm going to assemble obtaining all databases metrics data in single query and put in json table like "{[{"datname":"foo",stats:{<all_foo_db_data>},{"da tna me":"bar",stats:{<all_bar_db_data>},...]}" and in per DB extract those data to next level of the master items with only single json record per database and extract from that records per database metrics data.

This may be better as:
- processing of all databases data will be offloaded to the zabbix proxy
- something which I call "monitoring quantum effect" (monitoring is affecting state of the monitored object) will be/could be minimal

This is not all only about only databases LLD but for now I think that will be enough for now.
(I must do few other things today despite my work on PostgreSQL template

)

PS. BTW results of my current work is possible to observe on https://github.com/kloczek/zabbix-te...e%20PostgreSQL

**kloczek** · 14-08-2019, 14:45

BTW this "postgres" database locking issue. I think that it should be possible to reproduce that issue with passive items as well.

**AlexL** · 15-08-2019, 13:34

Thank you for your positive review.
We do hope that you succeed with your template for monitoring PostgreSQL.
The sql are located in files to make it easier for a DBA to read and understand what the sql is doing as well as provide general information about the sql there.

**kloczek** · 15-08-2019, 18:11

Second round.
On collection per database stats you are using:

Code:

SELECT json_object_agg(datname, row_to_json(T)) FROM (
        SELECT datname,
                        numbackends AS numbackends,
                        xact_commit AS xact_commit,
                        xact_rollback AS xact_rollback,
                        blks_read AS blks_read,
                        blks_hit AS blks_hit,
                        tup_returned AS tup_returned,
                        tup_fetched AS tup_fetched,
                        tup_inserted AS tup_inserted,
                        tup_updated AS tup_updated,
                        tup_deleted AS tup_deleted,
                        conflicts AS conflicts,
                        temp_files AS temp_files,
                        temp_bytes AS temp_bytes,
                        deadlocks AS deadlocks
        FROM pg_stat_database) T

This can be simplified to:

Code:

SELECT json_object_agg(datname, row_to_json(T)) FROM (
        SELECT datname,
                        numbackends,
                        xact_commit,
                        xact_rollback,
                        blks_read,
                        blks_hit,
                        tup_returned,
                        tup_fetched,
                        tup_inserted,
                        tup_updated,
                        tup_deleted,
                        conflicts,
                        temp_files,
                        temp_bytes,
                        deadlocks
        FROM pg_stat_database) T

However even this query can be simplified in next step to just:

Code:

SELECT json_object_agg(datname, row_to_json(T)) FROM (SELECT * FROM pg_stat_database) T

Advantage of using that query is that if in future PostgreSQL versions will be added some new column in that table it can be used still the same query to sample all necessary data.

Just committed my today changes and I'm combining in item which is used on sample all per table metric query size of the database:

Code:

UserParameter=service.pgsql.db.stats[*],/usr/bin/psql -U monitoring -qAtX postgres -c "SELECT row_to_json(row) FROM (SELECT *, pg_database_size('$1') FROM pg_stat_database WHERE datname = '$1') row;"

If I'll find any new per table metric I'm going to add it to that key to keep single key on obtaining all per table stats.

**kloczek** · 15-08-2019, 18:20

And yet another (small) thing. You've choose to use "zbx_monitor" as role name.
Many years ago I found that such account are used sometimes by other monitoring infrastructures so this is why every where if it is necessary to add some access for the monitoring software better is use some generic name. This is why I'm using everywhere "monitoring" (the same as no one now on use zabbix to monitor something over SNMP no one is asking to create special RO community for zabbix "public" SNMP community is used).

**jakiror** · 18-08-2019, 03:33

Originally posted by kloczek

And yet another (small) thing. You've choose to use "zbx_monitor" as role name.
Many years ago I found that such account are used sometimes by other monitoring infrastructures so this is why every where if it is necessary to add some access for the monitoring software better is use some generic name. This is why I'm using everywhere "monitoring" (the same as no one now on use zabbix to monitor something over SNMP no one is asking to create special RO community for zabbix "public" SNMP community is used).

Thank you.

**aforever** · 18-09-2019, 16:07

Hello!
I have next problem using template:
Item pgsql.ping always returns 0 telling me my database is Down. I've checked - it is up and accepting connections.
zabbix_get -s 192.168.56.110 -k pgsql.ping["192.168.56.110","5432","postgres","mydb"] returns sh: pg_isready: command not found
I have added location of pg_isready to $PATH and can execute it directly from console:
#pg_isready -h 192.168.56.110 -p 5432
192.168.56.110:5432 - accepting connections
How can i get this pgsql.ping working?
P.S. OS - Centos 7

**bristic** · 18-10-2019, 16:16

Hi AlexL ,

I have got a question related to preprocessing step on pgsql.ping.time. Regex filter out only one digit value and if there is a delay of 9+ms then it fails with

Code:

Preprocessing failed for: 1.Time: 11.710 ms 1. Failed: cannot perform regular expression "Time:\s(\d.\d+)\sms" match for value of type "string": pattern does not match

This way the check is useless because it wont report properly.

I suggest the change:

Code:

Time:\s(\d+.\d+)\sms

Is this the place to report such things?

**MBlecha** · 01-12-2019, 20:37

It might be wise to add to the installation notes some Postgres Version requirements.

For example, 'json_agg' is not available in Postgres 9.2 or lower, which is the default for CentOS 7.

**ramiropd** · 12-12-2019, 18:23

Hi,

We are using Zabbix 4.4 with this oficial PostgreSQL template, however when we remove a database from our monitored server, the following items are still checked:
- numbackends - xact_commit - xact_rollback - blks_read - blks_hit - tup_returned - tup_fetched - tup_inserted - tup_updated - tup_deleted - conflicts - temp_files - temp_bytes - deadlocks Is there any query that we can use to remove deleted databases? Are we using the database listing incorrectly? Like, is there any way to refresh the database list? Thanks!

**kepperta** · 18-12-2019, 14:34

Hi,

Ok never mind i got it all working, problem was that my pg_isready was returning values in polish not in English

**reedacus25** · 18-12-2019, 18:48

Chiming in that I think the documentation for this could be improved.

Ubuntu 18.04
PGSQL 9.5
ZBX 4.4.3

In step 5 it mentions that *IF* you are monitoring external servers you should create the .pgpass file, but I was monitoring a local (to the agent) server and still required a .pgpass file.
On top of that, some further instructions to

Code:

# chown zabbix:zabbix

the .pgpass file, and to

Code:

# chmod 600

the file would have saved a good deal of time as well.

And lastly, as part of step 4, a helpful hint to either restart the postgres server instance, or to execute the query:

Code:

SELECT pg_reload_conf();

would also prove helpful as well to others I imagine.

Ad Widget

Discussion thread for official Zabbix Template DB PostgreSQL

Discussion thread for official Zabbix Template DB PostgreSQL

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment