Hello,
We're managing a medium zabbix instance, monitoring ~3000 servers, storing it's data in postgresql. As usual, the history and history_uint are the big tables, with history_uint reaching a few tens of gigabytes with 10days of history on most metrics.
We recently upgraded the history* tables to add a primary key, using the zabbix documentation procedure. Since then, we're observing a very long query on history_uint, so CPU intensive we have to regularly delete the oldest rows in order to keep the monitoring working.
I first tried to optimize the postgres server configuration, among other tuning the work mem to avoid using a temp file on disk to process this huge request. However I wasn't able to lower the load, and each time the history_uint tables fills back, we start again to see the same query exhausting the ressources again.
The query, which asks for the MAX timestamp of each web scenario configured (full query here) :
If I split this huge query in five different smaller queries, one by "OR" occurence, it runs fine. Am I missing something obvious? Is it normal for zabbix server to do such a big query, requiring so much memory and CPU processing? Is there a way to manage it, maybe by partitioning the tables with timescaledb? I was also thinking about an index on 'itemid' and 'clock order by desc' to get the biggest timestamp without killing the CPU.
The schema for the problematic table :
We're managing a medium zabbix instance, monitoring ~3000 servers, storing it's data in postgresql. As usual, the history and history_uint are the big tables, with history_uint reaching a few tens of gigabytes with 10days of history on most metrics.
We recently upgraded the history* tables to add a primary key, using the zabbix documentation procedure. Since then, we're observing a very long query on history_uint, so CPU intensive we have to regularly delete the oldest rows in order to keep the monitoring working.
I first tried to optimize the postgres server configuration, among other tuning the work mem to avoid using a temp file on disk to process this huge request. However I wasn't able to lower the load, and each time the history_uint tables fills back, we start again to see the same query exhausting the ressources again.
The query, which asks for the MAX timestamp of each web scenario configured (full query here) :
Code:
SELECT h.itemid, MAX(h.clock) AS clock FROM history_uint h WHERE (h.itemid IN (132523,132811,...) OR h.itemid IN (...) ...) AND h.clock > 1678921992 GROUP BY h.itemid
The schema for the problematic table :
Code:
zabbix=# \d history_uint
Table "zabbix.history_uint"
Column | Type | Collation | Nullable | Default
--------+---------------+-----------+----------+--------------
itemid | bigint | | not null |
clock | integer | | not null | 0
value | numeric(20,0) | | not null | '0'::numeric
ns | integer | | not null | 0
Indexes:
"history_uint_pkey" PRIMARY KEY, btree (itemid, clock, ns)

Comment