Ad Widget

**kloczek** · 10-11-2018, 00:33

I think that I may know what kind of bottleneck exactly you are hitting. May I ask:
- how many pollers processes you have started on the server?
- how many items is monitored over the server? (and how many passive or snmp and active?)
- how many proxies you have? (passive and active)
- how many items and how many triggers? (in this case important is ratio between those numbers)

**ewestdal** · 13-11-2018, 21:21

Kloczek,

Here's the answers to your questions:

how many pollers processes you have started on the server?
- we have 224 zabbix related processes started on the server. More specifically we have:
  - 50 start pollers
  - 100 trappers
  - 50 history syncers
  - 1 escalator
  - 1 proxy poller
  - 1 self-monitoring
  - 2 vmware
  - 5 unreachable
  - 1 icmp poller
  - 1 http poller
  - 5 timers
  - and then a few other random zabbix related ones
how many items is monitored over the server? (and how many passive or snmp and active?)
- we currently have 1808477 items being monitored
- we do not have any snmp going into zabbix
how many proxies you have? (passive and active)
- 27 proxies and they are all active
how many items and how many triggers? (in this case important is ratio between those numbers)
- 1808594 items
- 874426 triggers

Essentially what we've noticed so far through debug and tcpdumps that the trapper processes on the Zabbix primary are not stuck but they are just taking forever to process the data. We've been working on reducing that load. Let me know if there is any additional data that might help.

**kloczek** · 14-11-2018, 13:26

I'm assuming that with 1.8 mln items you do not have any monitoring over server except zabbix server monitoring themselves.
1) number of trappers should be lower than number of proxies. Usually something like ratio between 1:2 to 1:1 (trappers

roxies) is enough. This especially case with active proxies when proxy decides when will push next batch of the monitoring data to the server
2) number of history syncers should be not more than 2*number of CPU cores on DB server in case of using NySQL 5.6 and below Whychc one type and version of server DB backend you are using? More syncers may cause as well congestion which is easy to catch by looking on number of tables locks/s. Monitoring of the DB engine should show it is the case with you zabbix stack. You should have look on what shows zabbix[preprocessing_queue] internal server metric
3) number of other processes like icmp pngers, http pollers could be slashed to 0 as none of the working monitoring on the server should be using those processes.

Above are not straight related to you main issue but it may me part of the problem.

OK so now main part. It is not documented anywhere in zabbix doc that poller processes are responsible for processing triggers.
I've hit this when I've been experimenting on my laptop https://support.zabbix.com/browse/ZBX-14394 trying to minimise memory footprint used by zabbix server

In other words to have enough speed of processing data which needs to be evaluated against triggers definition (and you have relatively high ratio triggers to items) you may need to increase number of pollers above 50 however I would recommend to really check what happens on DB side because congestion in triggers processing may have root cause on DB side (mainly or as well).
Before first try to increase pollers you should check how many of those processes are busy. If they are saturated at least above 80% it is possible that you have here here main bottleneck.
Can you tell what is you current pollers utilisation?
Other possible causes of your issue is not enough strong DB backend. Questions related to this area can be solved looking on DB engine monitoring data. Main factor will be ratio between read and write IOs on storage layer. Well tuned and architected DB engine should not have less than 1:20 read to write IO (I'm usually trying to keep this around 1:50).
To slow triggers processing may be caused by to high write latency operations (like inserts and updates) however it is not obvious to many people that key factor to gain really low latency of those queries is necessary to solve first low read IOs latency which is possible to gain only by have enough memory to cache most of MRU/MFU data in memory cache without touching storage.

PS. If you don't have implemented good enough DB engine monitoring and you are using MySQL >=5.7 you may try to use my Service MySQL template which provides all what is needed to diagnose your issue if it sits on DB engine side.
From point of diagnosing zabbix server bottleneck you may try as well my Service zabbix server template which has few more thing than standard OOTB zabbix template.

**kloczek** · 14-11-2018, 13:29

Yet another thought. You are using zabbix 3.0. Many scaleability issues have been solved after this major version. You should try to upgrade to at least 3.4.

**vso** · 08-01-2019, 17:08

How many low level discovery items do you have ?

**ggmojki** · 09-01-2019, 09:08

how many proxies you have?

Ad Widget

Zabbix 3.0 StartTrappers are getting stuck processing data

Zabbix 3.0 StartTrappers are getting stuck processing data

Comment

Comment

Comment

Comment

Comment

Comment