Hi,
I have a task to build a proxy for monitoring network devices in my company. We will monitor network devices via SNMP. Some items will also have preprocessing. By default - about 200 network devices are to be monitored, including: routers, switches.
Our Zabbix Server sometimes chokes and freezes and the number of items will increase in the future. We would like to relieve Zabbix Server of SNMP Agent type items and transfer their querying / preprocessing to a separate proxy.
What I already did:
I set up two new proxies and started testing them. Both proxies have the same hardware / OS configuration and I assume that Zabbix proxy will be connected in the same way.
Both proxies have correct network communication between network devices, and between proxy <-> Zabbix server (working).
Currently I am testing only one of the proxies. I have set up and I can see that it is choking a lot with only 1329 items (Required vps is at 10.37).
What I already see / notice:
My actually used for test proxy is very utilised. Screenshot is from last 7 days:

My vsphere - zabbix proxy server performance:
New Zabbix proxy servers configuration (both the same configuration):
Operating system (both the same):
Ubuntu 20.04.6 LTS
Currently I am monitoring a narrow section of the network environment (only on one of the two proxies, nothing is connected to the other):
4 network devices
Number of items: 1329
Required vps is at level 10.37
a dozen or so icmp items
the rest are SNMP Agents
Preprocessing: half without preprocessing, half have change per second set, 1/3 additionally have multiplier = 8
Intervals: 30s, 1m (most), 3m, 5m
Holding history: 90 days (imposed by the organization)
Proxy summary:
* In the future I plan to monitor the entire environment which will include:
Item appearance / sample monitoring configurations:
Items mainly created from several templates which are contain discovery.
Planned total number of items created: ~48000
Each device has additional ICMP monitoring:
item type: icmpping - 3 items (loss, ping, response time)
Preprocessing on items: none
The rest are SNMP Agent type items.
Intervals: 30s, 1m (most), 3m, 5m
Holding history: 90 days (imposed by the organization)
Preprocessing: about half of items have change per second, and about 1/3 additionally multiplier = 8
Configuration of my Zabbix server (6.4):
Role: Monitoring EVERYTHING in the company. We monitor websites, elasticsearch, kubernetes, JMX, databases and SNMP Agents (which we plan to migrate to new proxies).
The current configuration has been improved many times, currently the server is working quite stably but sometimes we notice a slowdown in operation - when searching for items, or simply zabbix tends to freeze suddenly, unfortunately this situation happened a long time ago and I no longer have information from that event.
Database: External, PostgreSQL
psql (PostgreSQL) 14.1 (Ubuntu 14.1-2.pgdg20.04+1)
Zabbix Server (6.4) config:
Zabbix Server utilisation (last 7 days)

Both new proxies configuration v: 6.4 (remember that 3 hosts are connected to only one of them, the other is not used):
Database: Internal = localhost, PostgreSQL
psql (PostgreSQL) 13.16 (Ubuntu 13.16-1.pgdg20.04+1)
Can you advise what I can improve in the configuration of this proxy to reduce its utilization of these 4 hosts (that I am testing) and prepare both proxies for a much higher load of data coming from SNMP when I add the rest of the 200 devices - about 100 for each proxy?
I have a task to build a proxy for monitoring network devices in my company. We will monitor network devices via SNMP. Some items will also have preprocessing. By default - about 200 network devices are to be monitored, including: routers, switches.
Our Zabbix Server sometimes chokes and freezes and the number of items will increase in the future. We would like to relieve Zabbix Server of SNMP Agent type items and transfer their querying / preprocessing to a separate proxy.
What I already did:
I set up two new proxies and started testing them. Both proxies have the same hardware / OS configuration and I assume that Zabbix proxy will be connected in the same way.
Both proxies have correct network communication between network devices, and between proxy <-> Zabbix server (working).
Currently I am testing only one of the proxies. I have set up and I can see that it is choking a lot with only 1329 items (Required vps is at 10.37).
What I already see / notice:
My actually used for test proxy is very utilised. Screenshot is from last 7 days:
My vsphere - zabbix proxy server performance:
New Zabbix proxy servers configuration (both the same configuration):
Code:
Servers set up in VMware (ESXi 6.7 and later (VM version 14)) 10 CPU 16GB Memory HD1: 20GB HD2: 40GB Network: standard configuration, everything works on the network.
Ubuntu 20.04.6 LTS
Currently I am monitoring a narrow section of the network environment (only on one of the two proxies, nothing is connected to the other):
4 network devices
Number of items: 1329
Required vps is at level 10.37
a dozen or so icmp items
the rest are SNMP Agents
Preprocessing: half without preprocessing, half have change per second set, 1/3 additionally have multiplier = 8
Intervals: 30s, 1m (most), 3m, 5m
Holding history: 90 days (imposed by the organization)
Proxy summary:
* In the future I plan to monitor the entire environment which will include:
Item appearance / sample monitoring configurations:
Items mainly created from several templates which are contain discovery.
Planned total number of items created: ~48000
Each device has additional ICMP monitoring:
item type: icmpping - 3 items (loss, ping, response time)
Preprocessing on items: none
The rest are SNMP Agent type items.
Intervals: 30s, 1m (most), 3m, 5m
Holding history: 90 days (imposed by the organization)
Preprocessing: about half of items have change per second, and about 1/3 additionally multiplier = 8
Configuration of my Zabbix server (6.4):
Role: Monitoring EVERYTHING in the company. We monitor websites, elasticsearch, kubernetes, JMX, databases and SNMP Agents (which we plan to migrate to new proxies).
The current configuration has been improved many times, currently the server is working quite stably but sometimes we notice a slowdown in operation - when searching for items, or simply zabbix tends to freeze suddenly, unfortunately this situation happened a long time ago and I no longer have information from that event.
Database: External, PostgreSQL
psql (PostgreSQL) 14.1 (Ubuntu 14.1-2.pgdg20.04+1)
Zabbix Server (6.4) config:
Code:
ListenPort=10051 LogType=file LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=1024 DebugLevel=3 PidFile=/run/zabbix/zabbix_server.pid SocketDir=/var/run/zabbix DBHost=<correct> DBName=<correct> DBUser=<correct> DBPassword=<correct> DBPort=<correct> AllowUnsupportedDBVersions=1 StartPollers=600 StartIPMIPollers=0 StartPreprocessors=115 StartConnectors=0 StartPollersUnreachable=120 StartHist oryPollers=5 StartTrappers=16 StartPingers=160 StartDiscoverers=0 StartHTTPPollers=60 StartTimers=2 StartEscalators=16 StartAlerters=50 JavaGateway=127.0.0.1 JavaGatewayPort=10052 StartJavaPollers=1 StartVMwareCollectors=5 VMwareFrequency=3600 VMwarePerfFrequency=21600 VMwareCacheSize=512M VMwareTimeout=10 SNMPTrapperFile=/tmp/zabbix_snmptrap.log.tmp StartSNMPTrapper=1 ListenIP=0.0.0.0 HousekeepingFrequency=1 Max HousekeeperDelete=250000 CacheSize=1408M CacheUpdateFrequency=30 StartDBSyncers=6 HistoryCacheSize=1024M HistoryIndexCacheSize=512M TrendCacheSize=384M TrendFunctionCacheSize=128M ValueCacheSize=2G Timeout=30 TrapperTimeout=60 UnreachablePeriod=300 UnavailableDelay=120 UnreachableDelay=300 AlertScriptsPath=/usr/share/pdagent-integrations/bin/ ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/bin /fping Fping6Location=/usr/bin/fping6 LogSlowQueries=2200 TmpDir=/tmp StartProxyPollers=0 ProxyConfigFrequency=600 ProxyDataFrequency=1 StartLLDProcessors=4 AllowRoot=0 User=zabbix SSLCertLocation=/etc/zabbix/ssl/certs SSLKeyLocation=/etc/zabbix/ssl/ StatsAllowedIP=127.0.0.1 StartReportWriters=1 WebServiceURL=<correct> ProblemHousekeepingFrequency=45 StartODBCPollers=2
Both new proxies configuration v: 6.4 (remember that 3 hosts are connected to only one of them, the other is not used):
Database: Internal = localhost, PostgreSQL
psql (PostgreSQL) 13.16 (Ubuntu 13.16-1.pgdg20.04+1)
Code:
ProxyMode=0 Server=<correct> Hostname=<correct> ListenPort=<correct> LogType=file LogFile=/var/log/zabbix/zabbix_proxy.log LogFileSize=128 DebugLevel=3 EnableRemoteCommands=1 LogRemoteCommands=1 PidFile=/run/zabbix/zabbix_proxy.pid SocketDir=/run/zabbix DBHost=127.0.0.1 DBName=<correct> DBUser=<correct> DBPassword=<correct> ProxyLocalBuffer=0 ProxyOfflineBuffer=12 HeartbeatFrequency=30 ConfigFrequency=300 DataSenderFrequency=5 StartPollers=100 StartIPMIPollers=5 StartPreprocessors=70 StartPollersUnreachable=20 StartTrappers=5 StartPingers=15 StartDiscoverers=5 StartHTTPPollers=30 SNMPTrapperFile=/var/log/snmptrap/snmptrap.log StartSNMPTrapper=1 HousekeepingFrequency=1 CacheSize=2G StartDBSyncers=5 HistoryCacheSize=1.5G HistoryIndexCacheSize=1G Timeout=30 ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/bin/fping Fping6Location= LogSlowQueries=300 AllowRoot=1 StatsAllowedIP=127.0.0.1,<correct>
Can you advise what I can improve in the configuration of this proxy to reduce its utilization of these 4 hosts (that I am testing) and prepare both proxies for a much higher load of data coming from SNMP when I add the rest of the 200 devices - about 100 for each proxy?
Comment