Ad Widget

Collapse

Help - tuning assistance

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pcor877
    Junior Member
    • Aug 2022
    • 15

    #1

    Help - tuning assistance

    Hello all - I'm only lightly experienced in Zabbix (and mostly with 4.x), and taking over installation from pervious person who has left. I have little documentation and much less experience that I need. Looking for help in tuning this environment.
    Maybe I need to spend my own $$$ and buy 1 or 2 support calls?

    please refer to the screen shots attached. please feel free to ask clarifying question ... I tried to include basic starting info

    Environment Summary:
    1. Monitored environment
      1. 100% AWS
      2. about 900 instances
      3. 5-20 scaling events per day average
    2. Zabbix 5.4
      1. Server running in AWS EKS (1.21) - 135 PODS spread across 3 m6i.large compute instances
        1. One Zabbix node (see zabbix conf below)
        2. three nginx nodes
      2. Database in RDS
        1. Postgres on db.t3.medium (2 vcpu, 4GB), engine v 12.8
      3. four proxies on ec2 instances, one for staging VPC, one each for VPCs in US, CA, EU
      4. All agents are active, and talk only to proxy. Mostly agents are v5.4.12
      5. Hosts are brought in from CloudQuery based on AWS tags (also in EKS)
      6. monitored instances are a mix of CentoS6 (yes, really), and AmazonLinux2
      7. we have some custom stuff, but also lots of generic templates
      8. LLD is using, but I'm not too familiar with LLD
    PROBLEMS:
    1. about 522 VPS
    2. Overall UI is 'ok' ... sometimes there is slowness
    3. Zabbix housekeeper processes more than 75% busy - looking at graphs, this is a huge problem
    4. some DB tables I think have WAY to much data ... likely related to above. no partitioning ... I'm not familiar with that process, and not much at all with Postgres
      1. history_uint - 853m
      2. history - 760m
      3. more in screen shot below
      4. can't even get count(*) in many of these .. this is just 'estimate' from dbeaver!




    Code:
    bash-5.1$ grep -vi ^# zabbix_server.conf | tr -s "\n"
    LogType=console
    DBHost=alc-sre-zabbix-prod.*
    DBName=xxxxx
    DBSchema=xxxxx
    DBUser=xxxxx
    DBPassword=xxxxxxxx
    DBPort=5432
    StartTrappers=10       <<<< is this good/bad?
    CacheSize=1024M       <<<< is this good/bad?
    ValueCacheSize=128M       <<<< is this good/bad?
    AlertScriptsPath=/usr/lib/zabbix/alertscripts
    ExternalScripts=/usr/lib/zabbix/externalscripts
    FpingLocation=/usr/sbin/fping
    SSHKeyLocation=/var/lib/zabbix/ssh_keys
    StartLLDProcessors=4         <<<< is this good/bad?
    User=zabbix
    SSLCertLocation=/var/lib/zabbix/ssl/certs/
    SSLKeyLocation=/var/lib/zabbix/ssl/keys/
    SSLCALocation=/var/lib/zabbix/ssl/ssl_ca/
    LoadModulePath=/var/lib/zabbix/modules/
    WebServiceURL=http://zabbix-web-service:10053/report
    Click image for larger version

Name:	Screen Shot 2022-08-24 at 6.58.30 PM.png
Views:	303
Size:	342.7 KB
ID:	450348

    Click image for larger version

Name:	Screen Shot 2022-08-24 at 7.04.43 PM.png
Views:	138
Size:	334.2 KB
ID:	450349

    Click image for larger version

Name:	Screen Shot 2022-08-24 at 7.09.18 PM.png
Views:	143
Size:	372.2 KB
ID:	450350

    Click image for larger version

Name:	Screen Shot 2022-08-24 at 7.10.03 PM.png
Views:	132
Size:	412.4 KB
ID:	450351
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    Logs should show you housekeeper stats, how much it removes each run.
    Other question, how long you keep history and trends...? That influences overall DB size.
    Check how busy your DB server is, that can influence, how fast your housekeeper runs... Also if you have lot of slow queries reported in server log, it may indicate DB issues...
    trappers/LLDProcessor counts... if it does not complain, then it should be enough..
    Your TrendCacheSize Seems a bit small, as it reports over 75% ... IIRC it is advisable to keep caches no more that 40% occupied...

    Comment

    • pcor877
      Junior Member
      • Aug 2022
      • 15

      #3
      Ok ...

      just to mention again ... I am not very familiar with Kubernetes unfortunately ... and the person who set this all up is gone.

      1. as this is in K8s ... logs are going to console only. I can look, but I'm not sure what I'm looking for here for housekeeper stats, if you can mention here what to look for ... and obviously a limited window to look (I can open a log console and watch logs for some time)

      2. History and Trends ... looks like most Items are either 7 days or 14days history with 365 days Trend ..

      3. below is AWS monitoring 1-week view for RDS DB ... seems ok to my mind, perhaps you will have an other opinion

      4. looking at https://www.zabbix.com/documentation.../zabbix_server, TrendCacheSize DEFAULT is 4M, and since it is not set in my Zabbix K8 manifest yaml, I assume that is default. these are the only 4 items set in the manifest.

      Code:
      - name: "ZBX_CACHESIZE"
      value: "1024M"
      - name: "ZBX_STARTLLDPROCESSORS"
      value: "4"
      - name: "ZBX_VALUECACHESIZE"
      value: "128M"
      - name: ZBX_STARTTRAPPERS
      value: "10"





      Click image for larger version

Name:	Screen Shot 2022-08-26 at 9.39.51 AM.png
Views:	199
Size:	503.2 KB
ID:	450499

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        grep "housekeeper" zabbix_server.log
        You should see lines like
        43867:20220829:002420.683 executing housekeeper
        43867:20220829:002423.680 housekeeper [deleted 0 hist/trends, 1 items/triggers, 405 events, 124 problems, 1 sessions, 0 alarms, 482 audit, 0 records in 2.981019 sec, idle for 1 hour(s)]
        43867:20220829:012424.674 executing housekeeper
        43867:20220829:012428.009 housekeeper [deleted 0 hist/trends, 0 items/triggers, 368 events, 478 problems, 0 sessions, 0 alarms, 480 audit, 0 records in 3.320441 sec, idle for 1 hour(s)]
        43867:20220829:022428.941 executing housekeeper
        43867:20220829:022432.800 housekeeper [deleted 0 hist/trends, 0 items/triggers, 578 events, 154 problems, 2 sessions, 0 alarms, 484 audit, 0 records in 3.825385 sec, idle for 1 hour(s)]

        At least it shows that you housekeeper is doing something...

        Comment

        • pcor877
          Junior Member
          • Aug 2022
          • 15

          #5
          Well, as I said ... there are no logs, because in K8s, they are going only to console. I will see if I can catch that log line in Lens ..

          Comment

          Working...