Ad Widget

**BradKnowles** · 13-06-2025, 22:13

I know we have some systems where `net.netfilter.nf_conntrack_max` has to be set to the maximum allowed value (524288 or larger), otherwise we experience some problems.

I don't know if this helps you, but at least it's something you can look at.

**AspenKle** · 15-06-2025, 11:05

Hi BradKnowles, curently we have.

Code:

cat /proc/sys/net/netfilter/nf_conntrack_max
262144

**BradKnowles** · 16-06-2025, 17:36

Originally posted by AspenKle

Hi BradKnowles, curently we have.

Code:

cat /proc/sys/net/netfilter/nf_conntrack_max
262144

Have you tried boosting that value to 524288 and seeing if the problem persists?

**AspenKle** · 16-06-2025, 19:39

Hi BradKnowles, I will check a bit more and try to set that parameters until next boot, i.e. setting it temporary.
Could you explain a bit more what you guy's are seeing in your systems?
After AI and Google I always end up with "high Recv-Q/Send-Q, you should focus on application and network tuning, not conntrack settings."
If you look at the picture this is what we see when Recv-Q is pilling up, a sudden high inbound flow to the Zabbix server.

In general:
Houskeeper is always fast
housekeeper [deleted 236436 hist/trends, 0 items/triggers, 28 events, 12 problems, 62 sessions, 0 alarms, 0 audit, 0 autoreg_host, 0 records in 95.029074 sec, idle for 1 hour(s)]
We do not see any zabbix alerts over 75% or similar when high Recv-Q sets in.
Regards

Attached Files

**AspenKle** · 17-06-2025, 14:23

BradKnowles I have just changed it temporarily :
sudo sysctl -w net.netfilter.nf_conntrack_max=524288
cat /proc/sys/net/netfilter/nf_conntrack_max
524288

**AspenKle** · 17-06-2025, 15:49

BradKnowles
We got the same result right now:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:12563 0.0.0.0:* LISTEN -
tcp 4097 4096 0.0.0.0:10051 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN -

and did a restart sudo service zabbix-server restart, then the queue went down.
Zabbix must be eating some data that takes way to long to process, but the funny thing is that we see no alerts over 75% for any zabbix util* (trapper*poller*histsync*unreach* etc process.
The thing we do see is Utilization of trapper data collector process, in % goes to 0, when Recv-Q goes over 2k

It is still:
cat /proc/sys/net/netfilter/nf_conntrack_max
524288

**AspenKle** · 17-06-2025, 17:19

This is very frustrating now

hm....

**AspenKle** · 18-06-2025, 13:36

It seems more stabile today or since last night. Did some changes on agent configs. Will update here after 48 h for next update on Recv-Q and changes done, if this was the fix.

**Markku** · 22-06-2025, 09:00

Originally posted by AspenKle

Houskeeper is always fast
housekeeper [deleted 236436 hist/trends, 0 items/triggers, 28 events, 12 problems, 62 sessions, 0 alarms, 0 audit, 0 autoreg_host, 0 records in 95.029074 sec, idle for 1 hour(s)]
Regards

I don't know if it is related to your original issue, but for me the housekeeper/database performance looks quite bad: 95 seconds spent on cleaning the database every hour. I would partition the database to release load from the database housekeeping.

"62 sessions" also hints that maybe you are using some API connections that you don't logout from. (Again, not probably related to your issue but as a general observation.)

Markku

**Markku** · 22-06-2025, 09:47

# 1357:20240130:133326.485 failed to accept an incoming connection: connection rejected, getpername() faild: [107] Transport endpoint is not connected.

I'm a bit puzzled about this: why are you showing server log from 1.5 years ago? Are the current server-side logs exactly the same now in 2025?

Markku

**AspenKle** · 22-06-2025, 12:18

Anyway Markku and BradKnowles, thanks for the information.
It turns out that this escalated after the number of hosts increased over the years.
Fix 1, made it better, we use passive but some agents had both configured, so we commented out :

Code:

Server=ZABBIX-IP
# ServerActive=ZABBIX-IP

When ServerActive is configured, it asks for new configuration after RefreshActiveChecks settings, this causes some tcp traffic on 10051, that we do not need, Recv-Q lower.
But this was not the root cause.

Fix 2, made it much better and seems like the root cause.
Some trapping agents (now 3 agents x 68 servers) uses https://www.nuget.org/packages/ZabbixSender.Async/1.2.0.
The

Code:

await sender.Send("MonitoredHost1", "trapper.item1", "12");

Was not handled correctly in the .Net agent, causing almost spam on tcp 10051, now it is handled correctly by dev, Recv-Q much lower.

Fix 3 or observation, the environment is scanned frequently using a Vulnerability Scanner, this also had in impact, that was the reason for this daily 01:00 problem.

Recv-Q is zero in general now.

Ad Widget

Zabbix server 6.0.40 Recv-Q is full tcp 10051

Zabbix server 6.0.40 Recv-Q is full tcp 10051

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment