Ad Widget

**cyber** · 17-01-2024, 11:02

First of all, ELK stack support is experimental...

Originally posted by artem.kh

Hi!
I use Zabbix 6.0 with PostgreSQL and Elasticsearch 7 as a history storage.
My config is:
3 Zabbix servers with HA manager
20 Zabbix proxies
PostgreSQL 13.4
Elasticsearch 7.10.2
OS: Oracle Linux 9.1
Over 9000 monitored servers, 1.4M items, 800k triggers.

Such amount of servers, triggers, items does not allow using of one database with data and history.

And this does not hold true either... I have very similar numbers... a bit more hosts, a bit less items and triggers... PG with Timescale manages easily with that...

I doubt there is something to set in your Zabbix config... My gut feeling says, your elk stack lags... (but it may as well be empty stomach rumbling...)
Both of your PG and ELK versions are quite behind. Updating those versions may bring you some improvements.

**artem.kh** · 17-01-2024, 15:46

Thank you for your response.

Yes, I understand that ELK support is experimental.
I've choose ELK because Time scale DB is DB based extension with its restrictions like a replication problems, locks, complexity with scaling...
I have some doubts in its performance and scaling without sharding.

I've created and checked a lot of statistics and experiments. There is no any metric or error I found, which can be a cause of lags.
I have a system resources graphs for zabbix server, PG, ELK, ELK performance graphs, graphs with zabbix errors on sending to ELK based on zabbix logs.
All of these things hasn't helped me to find out the cause of this problem.

And I say more: when zabbix start active history download its performance comes very high: triggers and values in history syncers stats are about 20000-30000 per 10sec. ELK latency stays good and doesn't differs from usual values. This means that zabbix can get history very fast, but in unknown causes it doesn't do this. It seems like zabbix is simply waiting for something...
Also I've tried to query data from ELK as zabbix: i've got results very fast.

My last thought is wrong cache setting. may be it can be a cause.

Can you share please your approximately performance with TSDB and it's size? Do you use replication and HA?
May be it will fit for me too

**cyber** · 18-01-2024, 09:27

Performance as NVPS? 4800+. DB size is currently ~650G, but we do not keep very long history, just 14 days + 1 year of trends. PG14.5 + TS2.7.2 + pg_auto_failover (which manages failovers and replication). Using compression in TS reduced DB size from ~1.4T to current size... hosts themselves are 16cpu-s and 128G memory... Maybe a bit oversized, but there were "reasons"..

I am pretty sure someone with a bit more PG knowledge can squeeze out some hidden performance there...

**Jun.Liu** · 14-06-2024, 14:38

Originally posted by cyber

Performance as NVPS? 4800+. DB size is currently ~650G, but we do not keep very long history, just 14 days + 1 year of trends. PG14.5 + TS2.7.2 + pg_auto_failover (which manages failovers and replication). Using compression in TS reduced DB size from ~1.4T to current size... hosts themselves are 16cpu-s and 128G memory... Maybe a bit oversized, but there were "reasons"..

I am pretty sure someone with a bit more PG knowledge can squeeze out some hidden performance there...

Just wondering is it a standalone server or with many proxy?

**cyber** · 17-06-2024, 13:53

Originally posted by Jun.Liu

Just wondering is it a standalone server or with many proxy?

Theres ~20 proxies involved..

**vso** · 20-12-2024, 12:32

How is it possible that 429 triggers were calculated for 4 values ? Is it possible that those are time based triggers ? Maybe some delay should be introduced after restart so that they are calculated later when the load is smaller and there is no actual cache warmup ?
├─241531 "/usr/sbin/zabbix_server: history syncer #3 [processed 4 values, 429 triggers in 103.734200 sec, syncing history]"

**cyber** · 20-12-2024, 12:59

Originally posted by vso

How is it possible that 429 triggers were calculated for 4 values ?

this one is even better..

├─241541 "/usr/sbin/zabbix_server: history syncer #13 [processed 1 values, 413 triggers in 101.494687 sec, syncing history]"

I never thought that the text there means that those triggers and items are related... ::P
so this one, how do I interpret this? No new values, but bunch of time based triggers recalculated?

Code:

 [processed 0 values, 844 triggers in 0.014566 sec, idle 1 sec]]

and this one? load of data came in, which almost all are related to some trigger and it caused recalculation?

Code:

[processed 31363 values, 30691 triggers in 8.070808 sec, idle 1 sec]

**vso** · 28-12-2024, 00:02

There should be some improvements under ZBX-24549. Yes, usually triggers are recalculated due to new values but there are also time based triggers that are recalculated every 30 seconds, this can be a problem especially after restart and delaying their calculation or adding possibility to control when to calculate could help.

Ad Widget

Zabbix server 6.0 syncing history from Elasticsearch too slow

Zabbix server 6.0 syncing history from Elasticsearch too slow

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment