Ad Widget

**user.zabbix** · 20-02-2020, 15:44

upgrate zabbix to Zabbix 4.4.5 :-(, result same....

**user.zabbix** · 20-02-2020, 16:18

I tested DB, DB work fine:
pgbench -c 10 -j 2 -t 10000 zabbix
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average = 0.952 ms
tps = 10501.078070 (including connections establishing)
tps = 10504.435852 (excluding connections establishing)

pgbench -i -s 50 zabbix
dropping old tables...
creating tables...
generating data...
5000000 of 5000000 tuples (100%) done (elapsed 6.26 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.

**user.zabbix** · 20-02-2020, 19:10

there are mesages in system:
Zabbix unreachable poller processes more than 75% busy
Zabbix poller processes more than 75% busy
More than 100 items having missing data for more than 10 minutes

**tim.mooney** · 22-02-2020, 01:17

You did a great job providing info about your install. I wish every question posed on these forums provided as much useful information as you have!

How many hosts and more importantly, how many items are you monitoring? What is your server's "new values per second" (NVPS)? You can find # of items and NVPS in Monitoring->Dashboard, in the "widget" for Zabbix System Details.

The numbers for several of your Zabbix processes have been increased by a large amount from the defaults. I'm talking about specifically

StartPollers=120
StartPreprocessors=32
StartPollersUnreachable=64
StartDiscoverers=64

Out of curiousity, were these settings increased after consulting with e.g. Zabbix professional services or Zabbix support, or perhaps after the Zabbix template applied to your server suggested increasing some of these values? Or were they just set to large values when the system was initially configured, since you have a server with lots of available resources? I'm just trying to understand how your site arrived at those settings.

Beyond the Zabbix settings, the main question I would have is whether any of the PostgreSQL performance tuning tools or suggestions have been applied to your database? The default config for PostgreSQL is generally OK for a variety of uses, but specific workloads can benefit greatly from careful tuning. If your environment is as large as I'm imaginging it is, it may be very necessary to do some pgsql tuning. The pgsql wiki for performance tuning has lots of documentation (not all of it current, which makes it a bit more challenging), and the tools it mentions to help you analyze your config and suggest tuning changes are probably a good place to start?

Bottlenecks and performance problems in a complex system are some of the most challenging problems to solve. I hope you'll post updates as you continue to diagnose and work on this problem. Probably the best advice I can give is to "make changes carefully". Even if you identify a bunch of things you want to change, I wouldn't change them all at once. Change settings one at a time or in small groups, and then allow enough time to determine whether that one change or set of changes had much impact.

**user.zabbix** · 22-02-2020, 14:48

foreword:
before build this system we had experience building similar system for 1500 hosts with NVPS ~534(we have a avg CPU load 30% and peak CPU load 70%).

That system was built on similar hardware and OS.

We get config for zabbix and postgres from previous system,
add 200 hosts and have NVPS ~671.

postgres tuned by timescaledb-tune(with small manual tune ) on both system.

grep -v "^#" postgresql.conf|cut -d "#" -f1|grep -v "^[[:space:]]*$"
listen_addresses='*'
max_connections = 512
shared_buffers = 7978MB
work_mem = 10212kB
maintenance_work_mem = 2047MB
dynamic_shared_memory_type = posix
effective_io_concurrency = 200
max_worker_processes = 19
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
synchronous_commit = off
wal_buffers = 32MB
wal_writer_delay = 2000ms
max_wal_size = 8GB
min_wal_size = 4GB
checkpoint_completion_target = 0.9
random_page_cost = 1.1
effective_cache_size = 23936MB
default_statistics_target = 500
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_error_verbosity = verbose
log_line_prefix = '%m [%p] '
log_timezone =---------
autovacuum = off
autovacuum_max_workers = 10
autovacuum_naptime = 10
datestyle = 'iso, mdy'
timezone = ------------
lc_messages = 'en_US.UTF8'
lc_monetary = 'en_US.UTF8'
lc_numeric = 'en_US.UTF8'
lc_time = 'en_US.UTF8'
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries = 'timescaledb'
max_locks_per_transaction = 256
timescaledb.max_background_workers = 8
timescaledb.last_tuned = '2019-12-11T15:09:26+02:00'
timescaledb.last_tuned_version = '0.7.0'

we turn off synchronous_commit
and increase wal_buffers.

**user.zabbix** · 22-02-2020, 14:55

the problem is we can not see bottlenecks.
we see huge a queues and Utilization of unreachable poller data collector processes 100 %.

But top or pg_top not show process with high CPU load or wait :-(

iostat not show queue on a disks.

I do experiment and left only one poller but queues grow and grow :-(

**user.zabbix** · 22-02-2020, 18:32

i checked postgres, locks not found

**tim.mooney** · 25-02-2020, 01:14

Thanks for providing the additional information about the PostgreSQL tuning/settings too.

With NVPS as large as yours, you may want to re-ask your question in the Zabbix for Large Environments area of the forums. I don't know for certain, but there may be people that watch that section of the forum and have experience scaling Zabbix to this size with PostgreSQL + TimeScale.

I will watch this thread within interest, and if you post in the Large Environments area of the forum I'll watch that one too, but I don't think I have any other help I can offer. My environment is thankfully smaller, and so far scaling Zabbix hasn't been an issue for me.

**Hamardaban** · 25-02-2020, 09:24

You have a large SNMP queue... And i don't see the StartSNMPTrapper parameter in your config ... Is it not listed by mistake, or is it really missing?

**user.zabbix** · 25-02-2020, 09:55

we are use only poolers because we have active check our network equipments
we have a 400 devices with 48 ports per device and 12 metrics per port

**Hamardaban** · 25-02-2020, 10:13

Sorry-I made a mistake with "trapper". Of course, you are requesting devices and traps have nothing to do with it.... Have you tried changing the feature for using bulk queries? Do the polled devices themselves have enough resources to process requests?

**user.zabbix** · 25-02-2020, 10:51

today i create test my database and get tps = 15828.479091 , why zabbix not write with this speed :-(
createdb -O postgres -E Unicode -T template0 example
pgbench -i -s 500 example
pgbench -c 200 -j 200 -t 10000 example
result:
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 500
query mode: simple
number of clients: 200
number of threads: 200
number of transactions per client: 10000
number of transactions actually processed: 2000000/2000000
latency average = 12.644 ms
tps = 15817.919698 (including connections establishing)
tps = 15828.479091 (excluding connections establishing)
-----------------------------------------------------------------------------------
pgbench -c 100 -j 100 -t 10000 example
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 500
query mode: simple
number of clients: 100
number of threads: 100
number of transactions per client: 10000
number of transactions actually processed: 1000000/1000000
latency average = 5.925 ms
tps = 16876.988435 (including connections establishing)
tps = 16885.724589 (excluding connections establishing)
-----------------------------------------------------------------------------------
pgbench -c 50 -j 50 -t 10000 example
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 500
query mode: simple
number of clients: 50
number of threads: 50
number of transactions per client: 10000
number of transactions actually processed: 500000/500000
latency average = 3.504 ms
tps = 14271.320883 (including connections establishing)
tps = 14275.799162 (excluding connections establishing)

**user.zabbix** · 25-02-2020, 11:03

"Do the polled devices themselves have enough resources to process requests"
Yes equipment are work properly load on CPU equipment less 20% :-[

**user.zabbix** · 25-02-2020, 11:07

"Have you tried changing the feature for using bulk queries?"

What do yuo mean?

2 SNMP agent

https://www.zabbix.com/documentation/current/manual/config/items/itemtypes/snmp#internal_workings_of_bulk_processing

"Since Zabbix 2.2.3 Zabbix server and proxy daemons query SNMP devices for multiple values in a single request"

Ad Widget

Zabbix not write data to DB postgresql, grow queue

Zabbix not write data to DB postgresql, grow queue

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment