Ad Widget

Collapse

Zabbix is struggling to collect data (crammed agent pollers don't fetch data)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • joeblade
    Junior Member
    • Jun 2026
    • 8

    #1

    Zabbix is struggling to collect data (crammed agent pollers don't fetch data)

    Hi there,

    Would somebody please shed some light on the issue I'm trying to understand and solve.
    I've got quite a large deployment where just recently monitoring items seem not to have been getting data on time that results in huge data gaps.
    Workload is spread among 3 Zabbix proxy servers.
    Just after a fresh reboot of the entire environment there is a burst of work going on across those 3 Zabbix proxy servers and data is coming through.
    However after some time the whole system starts to deteriorate and comes to almost a full halt and only occasional items manage to collect data.
    Increasing the number of async agent pollers would not help (I increased it from 8 to 20) - they all get clogged up over time.

    Interestingly server resources don't seem to be constraints because CPUs and RAM are underutilized and Zabbix proxy servers seem idle.

    1. How to explain the values for awaiting state to be ~1000 and maxed out almost all the time? Why are they not picked up moved out to the queue?

    2. Assuming they are problematic items and occupy the slots because they get close to the allotted timeout (Timeout=10 in my zabbix_proxy.conf but I also tried to bring it down to 3s) - so why are they not moved out to Unreachable pollers for later attempts?

    3. How to check what holds back those 1000 items on a poller and prevents them from being processed?

    NOTE: When I open one of the hosts and an item that is missing recent data (e.g. CPU) in GUI and push "Get value and test" button the return value appears instantaneously.

    Below is an example of one Zabbix proxy servers and summary of Zabbix processes running on it (also see the screenshots that follow).

    Code:
    Parameter                                           Value       Details
    =========                                           =====       =======
    Zabbix server is running                            Yes         zabbix-srv:10051
    Zabbix server version                               7.0.25      New update available
    Zabbix frontend version                             7.0.25      New update available
    Latest release                                      7.0.26      Release notes
    Number of hosts (enabled/disabled)                  2756        2729 / 27
    Number of templates                                 433    
    Number of items (enabled/disabled/not supported)    301001      282348 / 4351 / 14302
    Number of triggers (enabled/disabled [problem/ok])  103761      91473 / 12288 [526 / 90947]
    Required server performance, new values per second  2486.43    
    High availability cluster                           Disabled


    Code:
    cat /etc/os-release
    NAME="Red Hat Enterprise Linux"
    VERSION="8.10 (Ootpa)"
    Code:
    free -h
    total used free shared buff/cache available
    Mem: 15Gi 9.1Gi 655Mi 129Mi 5.6Gi 5.8Gi
    Swap: 9Gi 49Mi 9Gi
    Code:
    cat /proc/cpuinfo | grep processor
    processor : 0
    processor : 1
    processor : 2
    processor : 3

    Code:
    ===
    Load: load average: 1.35, 1.07, 1.11 CPU idle: 83.3 id
    ===
    ### Agent pollers ###
    Active: 20 Idle: 1
    agent poller #1 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #2 [got 18 values, queued 18 in 5 sec, awaiting 1000]
    agent poller #3 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #4 [got 13 values, queued 13 in 5 sec, awaiting 1000]
    agent poller #5 [got 2 values, queued 2 in 5 sec, awaiting 1000]
    agent poller #6 [got 2 values, queued 4 in 5 sec, awaiting 1000]
    agent poller #7 [got 6 values, queued 6 in 5 sec, awaiting 1000]
    agent poller #8 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #9 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #10 [got 2 values, queued 2 in 5 sec, awaiting 1000]
    agent poller #11 [got 2 values, queued 2 in 5 sec, awaiting 1000]
    agent poller #12 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #13 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #14 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #15 [got 1 values, queued 1 in 5 sec, awaiting 1000]
    agent poller #16 [got 29 values, queued 29 in 5 sec, awaiting 1000]
    agent poller #17 [got 5 values, queued 5 in 5 sec, awaiting 1000]
    agent poller #18 [got 2 values, queued 2 in 5 sec, awaiting 1000]
    agent poller #19 [got 1 values, queued 0 in 5 sec, awaiting 999]
    agent poller #20 [got 28 values, queued 28 in 5 sec, awaiting 1000]
    
    
    ### HTTP agent pollers ###
    Active: 0 Idle: 1
    
    ### SNMP pollers ###
    Active: 0 Idle: 1
    
    ### Classic pollers ###
    Active: 1 Idle: 9
    poller #25 [got 0 values in 0.000016 sec, getting values]
    
    ### Unreachable pollers ###
    Active: 1 Idle: 9
    unreachable poller #15 [got 0 values in 0.000033 sec, getting values]
    
    ### Trappers ###
    Active: 0 Idle: 10
    
    ### Preprocessing manager ###
    preprocessing manager #1 [queued 147, processed 168 values, idle 5.067374 sec during 5.080943 sec]
    
    ### TCP ###
    ESTABLISHED: 11 TIME_WAIT: 7185
    Code:
    zabbix_proxy -R diaginfo
    == history cache diagnostic information ==
    Items:0 values:941 time:0.000042
    Memory.data:
      size: free:536870528 used:0
      chunks: free:1 used:0 min:536870528 max:536870528
        buckets:
          256+:1
    Memory.index:
      size: free:4145072 used:48736
      chunks: free:3 used:4 min:59656 max:3981456
        buckets:
          256+:3
    Top.values:
    ==
    == preprocessing diagnostic information ==
    Cached items:33111 pending tasks:0 finished tasks:0 task sequences:0 queued count:3262029 queued size:303204312 direct count:48959 direct size:782045102 history size:1518458 time:0.023078
    Top.sequences:
    Top.peak:
      itemid:14265032 tasks:2
      itemid:11402899 tasks:2
    Top.values_num:
      itemid:1398646 values_num:78
      itemid:14433803 values_num:78
      itemid:14433785 values_num:78
      itemid:14392035 values_num:78
      itemid:666550 values_num:78
      itemid:5014657 values_num:78
      itemid:4991563 values_num:78
      itemid:8078952 values_num:78
      itemid:1398794 values_num:78
      itemid:973346 values_num:78
      itemid:14392032 values_num:78
      itemid:14433786 values_num:78
      itemid:1398331 values_num:78
      itemid:14433779 values_num:78
      itemid:14433798 values_num:78
      itemid:11407413 values_num:78
      itemid:665912 values_num:78
      itemid:14392047 values_num:78
      itemid:14392048 values_num:78
      itemid:665978 values_num:78
      itemid:7442505 values_num:78
      itemid:5679346 values_num:78
      itemid:14392036 values_num:78
      itemid:14392037 values_num:78
      itemid:5679629 values_num:78
    Top.values_sz:
      itemid:5680793 values_sz:731492
      itemid:14265032 values_sz:594896
      itemid:7030418 values_sz:354208
      itemid:6105421 values_sz:282324
      itemid:14327602 values_sz:270972
      itemid:6105353 values_sz:268400
      itemid:13849001 values_sz:265888
      itemid:1144283 values_sz:262171
      itemid:5835317 values_sz:259812
      itemid:6685189 values_sz:256476
      itemid:14770706 values_sz:249500
      itemid:14320359 values_sz:244790
      itemid:13928290 values_sz:244784
      itemid:14320648 values_sz:244784
      itemid:14791985 values_sz:244784
      itemid:5903930 values_sz:241515
      itemid:9352624 values_sz:241094
      itemid:5702469 values_sz:238616
      itemid:9723496 values_sz:237712
      itemid:14324374 values_sz:237287
      itemid:11120242 values_sz:235750
      itemid:11829377 values_sz:235192
      itemid:6587899 values_sz:233548
      itemid:8376820 values_sz:233328
      itemid:6587830 values_sz:232674
    Top.time_ms:
      itemid:9259610 time_ms:10
      itemid:14194720 time_ms:10
      itemid:9272560 time_ms:10
      itemid:13152724 time_ms:10
      itemid:15042843 time_ms:10
      itemid:583540 time_ms:10
      itemid:10612986 time_ms:10
      itemid:9279594 time_ms:10
      itemid:12072248 time_ms:10
      itemid:12072041 time_ms:10
      itemid:9265000 time_ms:10
      itemid:12073827 time_ms:10
      itemid:14975884 time_ms:10
      itemid:759194 time_ms:10
      itemid:12074596 time_ms:10
      itemid:14391438 time_ms:10
      itemid:11171728 time_ms:10
      itemid:10042786 time_ms:10
      itemid:11829324 time_ms:10
      itemid:10671016 time_ms:10
      itemid:12899676 time_ms:10
      itemid:9887823 time_ms:10
      itemid:12074346 time_ms:10
      itemid:15043067 time_ms:10
      itemid:5376149 time_ms:10
    Top.total_ms:
      itemid:14265032 total_ms:30
      itemid:14433504 total_ms:30
      itemid:14433507 total_ms:30
      itemid:6436141 total_ms:30
      itemid:13332097 total_ms:20
      itemid:14433505 total_ms:20
      itemid:11829324 total_ms:20
      itemid:14327552 total_ms:20
      itemid:14433516 total_ms:20
      itemid:666616 total_ms:20
      itemid:14265239 total_ms:10
      itemid:13599084 total_ms:10
      itemid:6587709 total_ms:10
      itemid:14194720 total_ms:10
      itemid:9272560 total_ms:10
      itemid:7742752 total_ms:10
      itemid:13152724 total_ms:10
      itemid:583540 total_ms:10
      itemid:6763783 total_ms:10
      itemid:666550 total_ms:10
      itemid:759900 total_ms:10
      itemid:14490555 total_ms:10
      itemid:15566402 total_ms:10
      itemid:14433786 total_ms:10
      itemid:8865437 total_ms:10
    ==
    == locks diagnostic information ==
    Locks:
      ZBX_MUTEX_LOG:0x7facec1d0000
      ZBX_MUTEX_CACHE:0x7facec1d0028
      ZBX_MUTEX_TRENDS:0x7facec1d0050
      ZBX_MUTEX_CACHE_IDS:0x7facec1d0078
      ZBX_MUTEX_SELFMON:0x7facec1d00a0
      ZBX_MUTEX_CPUSTATS:0x7facec1d00c8
      ZBX_MUTEX_DISKSTATS:0x7facec1d00f0
      ZBX_MUTEX_VALUECACHE:0x7facec1d0118
      ZBX_MUTEX_VMWARE:0x7facec1d0140
      ZBX_MUTEX_SQLITE3:0x7facec1d0168
      ZBX_MUTEX_PROCSTAT:0x7facec1d0190
      ZBX_MUTEX_PROXY_HISTORY:0x7facec1d01b8
      ZBX_MUTEX_MODBUS:0x7facec1d01e0
      ZBX_MUTEX_TREND_FUNC:0x7facec1d0208
      ZBX_MUTEX_REMOTE_COMMANDS:0x7facec1d0230
      ZBX_MUTEX_PROXY_BUFFER:0x7facec1d0258
      ZBX_MUTEX_VPS_MONITOR:0x7facec1d0280
      ZBX_RWLOCK_CONFIG:0x7facec1d02a8
      ZBX_RWLOCK_CONFIG_HISTORY:0x7facec1d02e0
      ZBX_RWLOCK_VALUECACHE:0x7facec1d0318
    ==
    == proxy buffer diagnostic information ==
    Memory:
      size: free:536840904 used:21128
      chunks: free:1 used:530 min:536840904 max:536840904
        buckets:
          256+:1
    
    ==
    Click image for larger version  Name:	img1.png Views:	0 Size:	67.8 KB ID:	513968

    Click image for larger version  Name:	img2.png Views:	0 Size:	37.1 KB ID:	513963

    Click image for larger version  Name:	img3.png Views:	0 Size:	57.8 KB ID:	513964

    Click image for larger version  Name:	img4.png Views:	0 Size:	47.9 KB ID:	513965
    Last edited by joeblade; 03-06-2026, 10:25.
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4931

    #2
    Agent poller is for passive checks, having so many as you do, you mainly do passive checks? Maybe let agents do their work by themselves and rely more on active checks? How often you poll your items?
    Maybe you should add couple of proxies, that 2.4kNVPS is kind of on the limit for 3...
    I had a case with snmp_pollers, where I just wanted too much from them. They are meant to poll a lot with walks an gets and then you should redistribute data into dependent items, but we were generating too much of walks and gets and I ended up with similar "awaiting 1000" lines and queues of data. After redesign, situation disappeared...

    Comment

    Working...