Ad Widget

Collapse

[5.4][Ubuntu] Problem with high cpu load utilization and preprocessing pollers

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • seroslaw
    Member
    • Apr 2021
    • 32

    #1

    [5.4][Ubuntu] Problem with high cpu load utilization and preprocessing pollers

    ​Hello

    I have a problem with my zabbix server CPU high load which is propably caused by high usage of preprocessing pollers.

    Info about my Zabbix environment:

    - Zabbix Server 5.4.10
    - DB - postgresql
    - Number of hosts (enabled/disabled) 4066
    - Required server performance, new values per second 897.87
    - im not using proxies, i have multiple network interfaces in my Zabbix server,
    - HA - no

    HW:

    - CPU: 16 cores
    - memory: 64 GB
    - HD - 30 GB

    Zabbix-server configuration:

    - DebugLevel=2
    - StartPollers=150
    - StartIPMIPollers=2
    - StartPreprocessors=180
    - StartPollersUnreachable=30
    - StartTrappers=30
    - StartPingers=50
    - StartDiscoverers=2
    - StartHTTPPollers=50
    - StartTimers=3
    - StartAlerters=2
    - StartJavaPollers=2
    - StartVMwareCollectors=5
    - VMwareFrequency=21600
    - VMwarePerfFrequency=21600
    - VMwareCacheSize=2G
    - VMwareTimeout=10
    - StartSNMPTrapper=1
    - HousekeepingFrequency=1
    - CacheSize=4G
    - CacheUpdateFrequency=60
    - StartDBSyncers=4
    - HistoryCacheSize=1G
    - HistoryIndexCacheSize=1G
    - TrendCacheSize=1G
    - ValueCacheSize=12G
    - Timeout=25
    - TrapperTimeout=30
    - LogSlowQueries=3000
    - StartProxyPollers=0

    Few days ago I noticed on my Zabbix server a high cpu utilization so i started to verify this situation. I started to monitorg my cpu load utilization and processes on server which are generating this load and I noticed the preprocessing worker's and preprocessing manager are responsible for this situation.
    At the beginning i added a 8 cores of CPU to server - there was 8, now I have 16 and it won't help. Next I started to manipulate a values in zabbix_server.config - I set the "StartPollers" value to 500 (before there was a 300 value) but my server would not start.

    Above i posted my actual zbx-server configuration and now it's working but still I'm noticing the problem with high cpu utilization. Below I'm adding some screen shots from my Zbx server.

    -------------
    Cpu load - last 30 days

    Click image for larger version

Name:	cpu load from last 30 days.png
Views:	9662
Size:	159.2 KB
ID:	445171

    vSphere - last week metrics of my Zabbix server:
    Click image for larger version

Name:	vsphere look on metrics.png
Views:	9676
Size:	98.7 KB
ID:	445172

    Htop result:

    Click image for larger version

Name:	load on zbx server and preprocessing workers.png
Views:	9663
Size:	978.7 KB
ID:	445173

    I read about Zabbix-server performance tuning: https://www.zabbix.com/documentation...ormance_tuning , so what i did:
    - I decreased the number of StartPollers to 150,
    - I decreased then namber of other *Pollers - my actual config is above
    - I see nothing in my log file. Is it safe to change debugLog lvl to 3 or even to 4 ? I know it can generate a big usage of space on my disks...

    There is an information: "Optimal number of instances is achieved when the item queue, on average, contains minimum number of parameters (ideally, 0 at any given moment). This value can be monitored by using internal check zabbix[queue]." In my case it looks bad ...

    Click image for larger version

Name:	zabbix queue values.png
Views:	8609
Size:	12.5 KB
ID:	445174

    Click image for larger version

Name:	zabbix queue graph 7 day.png
Views:	8617
Size:	88.1 KB
ID:	445175

    Could you please help me verify why preprocessing pollers are flapping in this way and help me set the appropriate values for my Zabbix environment ? Is it possible to verify which items are using preprocessing and are "heavy" to preprocess by Zabbix ?
    It's hard to find a good informations about how to set all vaules and prepare your Zabbix server for working fine. All answers are appreciated.

    Thanks in advance!
    Attached Files
  • seroslaw
    Member
    • Apr 2021
    • 32

    #2
    Could someone advise me on choosing the right poller values for my Zabbix server? I'm stuck with this problem.

    Comment

    • Markku
      Senior Member
      Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
      • Sep 2018
      • 1782

      #3
      Here are a few questions/comments for you to consider.

      - What happened on May 18 and May 23 (noticable changes on the processor load)?
      - What does it mean in the VMware Memory graph when "Granted" dropped from ~38G to ~15G on May 23?
      - What happened on May 23 when the VMware Disk latency was increased?
      - If you have identified that preprocessing consumes all your CPU, what prevents you from sharing that load to other servers by using Zabbix proxies?
      - Your 4000 hosts don't surely have their items configured by hand host by host, most probably you have a couple of templates that have the items and their preprosessing rules configured, so check them if you need to know what kind of preprocessing rules you have (or, just check some of your hosts).
      - What do your Zabbix server graphs (for monitoring the Zabbix server processes) tell you about the situation? (Data gathering processes, Data handling processes, Internal processes, Reporting processes, Cache usage, Internal queues, etc.)

      Markku

      Comment

      • ripperSK
        Member
        • Jul 2019
        • 42

        #4
        all & all I agree with Markku on the troubleshooting steps.

        for me this indicates you or your colleague probably added some pre-processing of an item / items that is now lagging the the whole zabbix server.

        Imagine if CPU needs to wait for I/O during pre-processing - the pre-process process actually consumes that CPU core which is in wait mode - unable to process anything else until it's done with that process.

        For example running complex string manipulation on large text item can cause I/O wait. There are many ways pre-processing can consume a CPU core and start creating a problem on your server.

        Please have a look at the official DOCs for your installation here: https://www.zabbix.com/documentation.../preprocessing

        Go through your most used templates and re-check if there is not error / wrong logic / too complex task / anything fishy in items pre-processing.

        Disabling an item for limited time in a template may give you a clue which item is causing the problem if nothing else helps... disable the item - observe for 2 or 3 item checks - if situation does not improve - enable & go for next item.


        good luck

        Comment

        • seroslaw
          Member
          • Apr 2021
          • 32

          #5
          Markku
          - What happened on May 18 and May 23 (noticable changes on the processor load)?
          There is a problem with verify it. no history here. I think - someone testing something in Zabbix server - maybe new items with preprocessing.

          - What does it mean in the VMware Memory graph when "Granted" dropped from ~38G to ~15G on May 23?
          Propably it's the result of restarting Zabbix server service, This service didn't work. I should put info about it in main topic - sorry.

          - What happened on May 23 when the VMware Disk latency was increased?
          Same as abov - service started so latency grow up.

          If you have identified that preprocessing consumes all your CPU, what prevents you from sharing that load to other servers by using Zabbix proxies?
          We have old configuration and we received it after old Zabbix administrators, now we have many problems Now we are planning how to use proxies in the future.

          Your 4000 hosts don't surely have their items configured by hand host by host, most probably you have a couple of templates that have the items and their preprosessing rules configured, so check them if you need to know what kind of preprocessing rules you have (or, just check some of your hosts).
          It's true, we are using one main template for OS Linux and Windows but it's normal. We are also monitoring technologies like ElasticSearch, vmware, K8s, dockers - everything, so we have many different templates with items and preprocesses types. We removed some templates for avoid doubles on our hosts, for now Zabbix works little better.

          What do your Zabbix server graphs (for monitoring the Zabbix server processes) tell you about the situation? (Data gathering processes, Data handling processes, Internal processes, Reporting processes, Cache usage, Internal queues, etc.)
          For now It's little better. Other processes looks well, but preprocessing pollers are flapping from +- 20% to 100% few times per hour.

          Click image for larger version

Name:	answer1 utulizations.png
Views:	8245
Size:	79.6 KB
ID:	445735

          Graphs from last 30 days:

          Click image for larger version

Name:	answer1 graphs.png
Views:	8259
Size:	144.4 KB
ID:	445733
          Attached Files

          Comment

          • seroslaw
            Member
            • Apr 2021
            • 32

            #6
            Originally posted by ripperSK
            all & all I agree with Markku on the troubleshooting steps.

            for me this indicates you or your colleague probably added some pre-processing of an item / items that is now lagging the the whole zabbix server.

            Imagine if CPU needs to wait for I/O during pre-processing - the pre-process process actually consumes that CPU core which is in wait mode - unable to process anything else until it's done with that process.

            For example running complex string manipulation on large text item can cause I/O wait. There are many ways pre-processing can consume a CPU core and start creating a problem on your server.

            Please have a look at the official DOCs for your installation here: https://www.zabbix.com/documentation.../preprocessing

            Go through your most used templates and re-check if there is not error / wrong logic / too complex task / anything fishy in items pre-processing.

            Disabling an item for limited time in a template may give you a clue which item is causing the problem if nothing else helps... disable the item - observe for 2 or 3 item checks - if situation does not improve - enable & go for next item.


            good luck
            Ok, thanks for Your answer. We have a lot of different templates and we are trying always verify items.

            Sometimes i think there are better ways to monitor something by Zabbix - I mean - creating for example - python scripts which are giving an output to Zabbix Server (as Zabbix trapper item). Whole preprocessing process I can make by script. But if You want monitoring everything of everything, this can be a problem.

            I'm wondering if I'll set up more preprocessing pollers for my Zabbix Server, this could resolve a problem with 100% of preprocessing poller usage ?
            How can I calculate: "Total amount of different pollers" for my Zabbix server resources and usage. I read something about Queue size - if this value is possible low - it's good. If this value is high - it's bad for your Zabbix Server. This is my actual Queue size graph from Last day::

            Click image for larger version

Name:	answer2 queue size.png
Views:	7837
Size:	125.8 KB
ID:	445744
            Attached Files

            Comment

            • seroslaw
              Member
              • Apr 2021
              • 32

              #7
              We create a new machine for zabbix proxy and then we set top hosts monitored in our Zabbix which have a lot of items with preprocessing, and there is still no improvement. After add next vCPU's the load drops down but still it's high. Actually we are working on upgrade our Zabbix to 6.0.x

              Comment

              • ripperSK
                Member
                • Jul 2019
                • 42

                #8
                so please consult a newly created and very nicely done documentation on pre-processing in zabbix:



                to quote the answer to your question:

                Zabbix server configuration file allows users to set count of preprocessing worker processes. StartPreprocessors configuration parameter should be used to set number of pre-forked instances of preprocessing workers. Optimal number of preprocessing workers can be determined by many factors, including the count of "preprocessable" items (items that require to execute any preprocessing steps), count of data gathering processes, average step count for item preprocessing, etc.
                lookup StartPreprocessors config directive in server conf file and try to tweak this value to match your pre-processing needs.

                Comment

                Working...