More than 100 items having missing data for more than 10 minutes on a single host - ZABBIX Forums

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to REGISTER before you can post. To start viewing messages, select the forum that you want to visit from the selection below.

X

EHRETic

Member

Joined: Jan 2021

Posts: 45
#1

More than 100 items having missing data for more than 10 minutes on a single host

11-03-2024, 19:51

Hello there,

I need a little help troubleshooting a warning that I have because of one single host.
Queue looks like this (I have more than 100 items) and no information is getting above the 15 minutes delay:

Last lines of host agent log:

Code:

2024/03/10 20:35:49.953313 [101] active check configuration update from [192.168.X.X:10051] is working again 2024/03/10 20:35:49.955537 [101] history upload to [192.168.X.X:10051] [SRV--XXX] is working again 2024/03/10 20:36:41.312534 [101] sending of heartbeat message to [192.168.X.X:10051] is working again

The "working again" is probably due to Zabbix server restart or the host restart when I tried fixing it but otherwise, I didn't spot any noticeable error.
I've tried reinstalling the agent on host (with reboot before reinstall) and also disabling the agent and re-enabling it.
This didn't fix the issue so where should I continue troubleshooting?

Thanks in advance for your help! ;-)
Tags: None
The_Shadow

Junior Member

Joined: Mar 2024

Posts: 6
#2

11-03-2024, 20:49

Hi,
same issue with zabbix docker server. (v. 6.4.12). After docker container reboot all work again. I'm monitoring if the issue came back
Comment
1 comment

#2.1

cfrancis commented

24-10-2024, 15:59

Editing a comment

I also solved the problem by restarting the server.
TheOddPerson

Junior Member

Joined: Feb 2024

Posts: 12
#3

12-03-2024, 17:09

<Post Redacted - Please see next post>

Last edited by TheOddPerson; 12-03-2024, 18:29.
Comment
TheOddPerson

Junior Member

Joined: Feb 2024

Posts: 12
#4

12-03-2024, 17:21

I see the error in the log relating to Active checks.

Is this host running a Zabbix Agent in Active mode?
What is the timeout for the type of check that is clogging up the system?
How many pollers/trappers do you have running?
Are you receiving any warnings on your zabbix server?
How many hosts do you have configured for that same type of poller?

check /etc/zabbix/zabbix_server.conf

For in the case of Zabbix Agent items, the default is this:
Timeout=3
StartPollers=5

For Active Agents it is this:
StartTrappers=5

The default timeout is reasonable for most installations.
You may want to consider increasing the pollers if the timeout is default.

Last edited by TheOddPerson; 12-03-2024, 18:29.
Comment
EHRETic

Member

Joined: Jan 2021

Posts: 45
#5

12-03-2024, 19:18

Hi TheOddPerson ,

Thanks for your reply, I'll try to reply as concise as possible!

Originally posted by TheOddPerson

I see the error in the log relating to Active checks.
Is this host running a Zabbix Agent in Active mode?

Yes, all of my hosts are active (except some SNMP ) - I have today 43 active agents (mix of Linux & Windows VMs, including Zabbix server) + 9 SNMP, one SNMP is most of the time offline on purpose

Originally posted by TheOddPerson

What is the timeout for the type of check that is clogging up the system?

I didn't change it so default

Originally posted by TheOddPerson

How many pollers/trappers do you have running?

Also default

Originally posted by TheOddPerson

Are you receiving any warnings on your zabbix server?

Not that I'm aware of (except the active warning "More than....")

Originally posted by TheOddPerson

How many hosts do you have configured for that same type of poller?

I'm not sure about this, so dumb question: what is the "same type of pollers"?

I gave yesterday my server more resources (it's a single VM that has now 4 vCPUs and 4 GB of RAM) but it didn't help.
Now that I roughly understand the concept of pollers, I understand why it didn't change... so increasing the configuration values should not be an issue (also giving even more ressources)

Should I go to 10 for StartPollers & StartTrappers?
Comment
TheOddPerson

Junior Member

Joined: Feb 2024

Posts: 12
#6

12-03-2024, 19:28

Giving the server more resources won't help unless you're seeing Zabbing complain about high CPU or memory utilization.
I would definitely increase the number of trappers. This is the number of items that Zabbix can receive simultaneously. I would start with 10 and see if that makes any difference.
The default configuration is for a fairly small implementation so expect to need to increase some values. Zabbix should be triggering problems on itself if it sees the poller / memory / cpu / cache usage too high and that can guide you on what configuration needs to be changed.
Comment

EHRETic

Member

Joined: Jan 2021
Posts: 45

#7

12-03-2024, 19:55

Edit, I tried the following, but the queue reappeared after a few minutes... :-/

Code:

############ ADVANCED PARAMETERS ################

### Option: StartPollers
#       Number of pre-forked instances of pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollers=5
StartPollers=10

### Option: StartIPMIPollers
#       Number of pre-forked instances of IPMI pollers.
#               The IPMI manager process is automatically started when at least one IPMI poller is started.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartIPMIPollers=0

### Option: StartPreprocessors
#       Number of pre-forked instances of preprocessing workers.
#               The preprocessing manager process is automatically started when preprocessor worker is started.
#
# Mandatory: no
# Range: 1-1000
# Default:
# StartPreprocessors=3
StartPreprocessors=5

### Option: StartPollersUnreachable
#       Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java).
#       At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers
#       are started.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollersUnreachable=1
StartPollersUnreachable=5

### Option: StartTrappers
#       Number of pre-forked instances of trappers.
#       Trappers accept incoming connections from Zabbix sender, active agents and active proxies.
#       At least one trapper process must be running to display server availability and view queue
#       in the frontend.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartTrappers=5
StartTrappers=10

### Option: StartPingers
#       Number of pre-forked instances of ICMP pingers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPingers=1
StartPingers=5

### Option: StartDiscoverers
#       Number of pre-forked instances of discoverers.
#
# Mandatory: no
# Range: 0-250
# Default:
# StartDiscoverers=1

### Option: StartHTTPPollers
#       Number of pre-forked instances of HTTP pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartHTTPPollers=1
StartHTTPPollers=10

Comment

EHRETic

Member

Joined: Jan 2021

Posts: 45
#8

13-03-2024, 15:27

Hi,

I just doubled all values from the above post but (so 20 for both pollers)
I also increased all cache(s) size.

Still the same result, any clue?
(It doesn't really change the ressources consumption)
Comment
TheOddPerson

Junior Member

Joined: Feb 2024

Posts: 12
#9

13-03-2024, 21:28

Make sure you're editing the correct config file
systemctl status zabbix-server
should tell you in the first line what config file you're using
you should also see child processes for each of the trappers.

No zabbix problems on the zabbix server itself?
Comment
Zuzuka

Member

Joined: Aug 2011

Posts: 39
#10

14-03-2024, 12:49

Try to run "ps -ax | grep zabbix" on Zabbix Server and look for the load of zabbix processes (how they are loaded):

If there are all syncers/discoverers/pollers/trappers/etc. are busy then you need to add more. Don't forget that adding more processes to Zabbix may require to add more CPU cores
Comment
Zuzuka

Member

Joined: Aug 2011

Posts: 39
#11

14-03-2024, 13:16

Also on agent side look for "StartAgents" parameter value in "zabbix_agentd.conf" file. Maybe this value is required to increase if you have many items to monitor on a single host. Test with different values - I using StartAgents=10.
Comment

EHRETic

Member

Joined: Jan 2021
Posts: 45

#12

15-04-2024, 20:56

Originally posted by Zuzuka

Try to run "ps -ax | grep zabbix" on Zabbix Server and look for the load of zabbix processes (how they are loaded):
If there are all syncers/discoverers/pollers/trappers/etc. are busy then you need to add more. Don't forget that adding more processes to Zabbix may require to add more CPU cores

Well, they seems pretty much OK I think no?
(the rest is also OK, it's a nice homelab, but it remains a lab with less traffic as normal)

Code:

   1771 ?        S      1:47 /usr/sbin/zabbix_server: poller #1 [got 0 values in 0.000047 sec, idle 1 sec]
   1772 ?        S      1:42 /usr/sbin/zabbix_server: poller #2 [got 0 values in 0.000033 sec, getting values]
   1773 ?        S      1:40 /usr/sbin/zabbix_server: poller #3 [got 7 values in 3.956291 sec, getting values]
   1775 ?        S      1:43 /usr/sbin/zabbix_server: poller #4 [got 0 values in 0.000041 sec, idle 1 sec]
   1776 ?        S      1:44 /usr/sbin/zabbix_server: poller #5 [got 0 values in 0.000059 sec, idle 1 sec]
   1777 ?        S      1:46 /usr/sbin/zabbix_server: poller #6 [got 0 values in 0.000030 sec, idle 1 sec]
   1778 ?        R      1:40 /usr/sbin/zabbix_server: poller #7 [got 5 values in 0.064381 sec, getting values]
   1779 ?        S      1:42 /usr/sbin/zabbix_server: poller #8 [got 0 values in 0.000058 sec, idle 1 sec]
   1781 ?        S      1:44 /usr/sbin/zabbix_server: poller #9 [got 0 values in 0.000046 sec, idle 1 sec]
   1782 ?        S      1:42 /usr/sbin/zabbix_server: poller #10 [got 0 values in 0.000024 sec, idle 1 sec]
   1783 ?        S      1:39 /usr/sbin/zabbix_server: poller #11 [got 0 values in 0.000045 sec, getting values]
   1785 ?        S      1:41 /usr/sbin/zabbix_server: poller #12 [got 1 values in 0.052022 sec, idle 1 sec]
   1786 ?        S      1:47 /usr/sbin/zabbix_server: poller #13 [got 0 values in 0.000027 sec, idle 1 sec]
   1787 ?        S      1:43 /usr/sbin/zabbix_server: poller #14 [got 0 values in 0.000045 sec, idle 1 sec]
   1788 ?        S      1:46 /usr/sbin/zabbix_server: poller #15 [got 4 values in 1.125960 sec, idle 1 sec]
   1789 ?        S      1:42 /usr/sbin/zabbix_server: poller #16 [got 0 values in 0.000021 sec, idle 1 sec]
   1790 ?        S      1:40 /usr/sbin/zabbix_server: poller #17 [got 0 values in 0.000031 sec, getting values]
   1791 ?        S      1:47 /usr/sbin/zabbix_server: poller #18 [got 0 values in 0.000025 sec, getting values]
   1792 ?        S      1:36 /usr/sbin/zabbix_server: poller #19 [got 0 values in 0.000046 sec, idle 1 sec]
   1793 ?        S      1:39 /usr/sbin/zabbix_server: poller #20 [got 0 values in 0.000046 sec, getting values]

Comment

EHRETic

Member

Joined: Jan 2021

Posts: 45
#13

15-04-2024, 21:12

Originally posted by Zuzuka

Also on agent side look for "StartAgents" parameter value in "zabbix_agentd.conf" file. Maybe this value is required to increase if you have many items to monitor on a single host. Test with different values - I using StartAgents=10.

I could not find this parameters in my agent's configuration files and it probably be cause I use v2 with active checks (https://www.zabbix.com/forum/zabbix-...ter-in-agent-2)
Any clue about what I can do?
Comment
TheOddPerson

Junior Member

Joined: Feb 2024

Posts: 12
#14

22-04-2024, 22:57

It's become apparent you're using active checks.. in which case you want to check your TRAPPERS
run
ps -ax | grep trapper
to see how your trappers are doing.
You may want to increase the number of trappers.

Is the queue being consumed by 1 client? How many clients are configured as active agents?
Comment

Previous template Next

Working...