Hello all - I'm only lightly experienced in Zabbix (and mostly with 4.x), and taking over installation from pervious person who has left. I have little documentation and much less experience that I need. Looking for help in tuning this environment.
Maybe I need to spend my own $$$ and buy 1 or 2 support calls?
please refer to the screen shots attached. please feel free to ask clarifying question ... I tried to include basic starting info
Environment Summary:



Maybe I need to spend my own $$$ and buy 1 or 2 support calls?
please refer to the screen shots attached. please feel free to ask clarifying question ... I tried to include basic starting info
Environment Summary:
- Monitored environment
- 100% AWS
- about 900 instances
- 5-20 scaling events per day average
- Zabbix 5.4
- Server running in AWS EKS (1.21) - 135 PODS spread across 3 m6i.large compute instances
- One Zabbix node (see zabbix conf below)
- three nginx nodes
- Database in RDS
- Postgres on db.t3.medium (2 vcpu, 4GB), engine v 12.8
- four proxies on ec2 instances, one for staging VPC, one each for VPCs in US, CA, EU
- All agents are active, and talk only to proxy. Mostly agents are v5.4.12
- Hosts are brought in from CloudQuery based on AWS tags (also in EKS)
- monitored instances are a mix of CentoS6 (yes, really), and AmazonLinux2
- we have some custom stuff, but also lots of generic templates
- LLD is using, but I'm not too familiar with LLD
- Server running in AWS EKS (1.21) - 135 PODS spread across 3 m6i.large compute instances
- about 522 VPS
- Overall UI is 'ok' ... sometimes there is slowness
- Zabbix housekeeper processes more than 75% busy - looking at graphs, this is a huge problem
- some DB tables I think have WAY to much data ... likely related to above. no partitioning ... I'm not familiar with that process, and not much at all with Postgres
- history_uint - 853m
- history - 760m
- more in screen shot below
- can't even get count(*) in many of these .. this is just 'estimate' from dbeaver!
Code:
bash-5.1$ grep -vi ^# zabbix_server.conf | tr -s "\n" LogType=console DBHost=alc-sre-zabbix-prod.* DBName=xxxxx DBSchema=xxxxx DBUser=xxxxx DBPassword=xxxxxxxx DBPort=5432 StartTrappers=10 <<<< is this good/bad? CacheSize=1024M <<<< is this good/bad? ValueCacheSize=128M <<<< is this good/bad? AlertScriptsPath=/usr/lib/zabbix/alertscripts ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/sbin/fping SSHKeyLocation=/var/lib/zabbix/ssh_keys StartLLDProcessors=4 <<<< is this good/bad? User=zabbix SSLCertLocation=/var/lib/zabbix/ssl/certs/ SSLKeyLocation=/var/lib/zabbix/ssl/keys/ SSLCALocation=/var/lib/zabbix/ssl/ssl_ca/ LoadModulePath=/var/lib/zabbix/modules/ WebServiceURL=http://zabbix-web-service:10053/report
Comment