Ad Widget

Collapse

Zabbix poller processes more than 75% busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • batchenr
    Senior Member
    • Sep 2016
    • 440

    #1

    Zabbix poller processes more than 75% busy

    Hello,

    im running a zabbix server 3.2.6 on Centos 7 with a separate database machine. server has 8 cores and 17G memmory (uses only 10)
    now i have tried all forums and suggestions but still i see in the
    poller processes graph stable on 100% busy.

    i have 160 active hosts and devices.
    and 83171 items.

    my zabbix poller settings :

    StartPollers=456
    StartPollersUnreachable=100

    and have set my.cnf to max_connections=490

    even if i have this trigger zabbix server works perfectly fine
    but again i want everything to be clear but i haven't manage to fix it.
  • ovas
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Apr 2017
    • 138

    #2
    Hello batchenr!

    Have you got IPMI monitoring enabled and if yes, how relatively big is your usage of it? How high is Timeout setting? What are the Unreachable* and Unavailable* settings? Is LogSlowQueries setting altered from default? If yes, is it possible you set it to "3000" and see, if there are any slow queries logged in zabbix_server.log? Are there any errors logged at all?

    Are you able to provide the following graphs?
    • Zabbix server performance
    • Zabbix internal process busy
    • Zabbix data gathering process busy
    • Zabbix cache usage


    I mean, there must be some clues for such behavior. Have you tried inspecting database performance?

    Comment

    • batchenr
      Senior Member
      • Sep 2016
      • 440

      #3
      Originally posted by ovas
      Hello batchenr!

      Have you got IPMI monitoring enabled and if yes, how relatively big is your usage of it? How high is Timeout setting? What are the Unreachable* and Unavailable* settings? Is LogSlowQueries setting altered from default? If yes, is it possible you set it to "3000" and see, if there are any slow queries logged in zabbix_server.log? Are there any errors logged at all?

      Are you able to provide the following graphs?
      • Zabbix server performance
      • Zabbix internal process busy
      • Zabbix data gathering process busy
      • Zabbix cache usage


      I mean, there must be some clues for such behavior. Have you tried inspecting database performance?

      1.Have you got IPMI monitoring enabled and if yes, how relatively big is your usage of it?
      i dont monitor ipmi but i can see IPMI monitoring: YES
      how do i disable it ?


      2.How high is Timeout setting?
      time out is set to 30 the highest.

      3.What are the Unreachable* and Unavailable* settings?
      what is that means ? where do i see it ?


      4.Is LogSlowQueries setting altered from default? If yes, is it possible you set it to "3000" and see, if there are any slow queries logged in zabbix_server.log?
      it is set LogSlowQueries=3000
      and i dont see any referance in logs for slow queris


      5.Are there any errors logged at all?
      nope

      Graphs atteched.
      Attached Files
      Last edited by batchenr; 29-05-2017, 14:27.

      Comment

      • ovas
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Apr 2017
        • 138

        #4
        Originally posted by batchenr
        i dont monitor ipmi but i can see IPMI monitoring: YES
        how do i disable it ?
        Then it means that Zabbix server only built with IPMI monitoring, not used by default (StartIPMIPollers=0 by default in config file). It is already disabled, as per the graph data.

        Originally posted by batchenr
        time out is set to 30 the highest.
        This can be a influencing the poller process overload, if major count of checks are hitting the 30s limit. Pollers are busy awaiting the response. Is it critical for your environment? Are you able to experiment with this setting?

        Originally posted by batchenr
        what is that means ? where do i see it ?
        I mean UnreachablePeriod, UnreachableDelay and UnavailableDelay from the Zabbix server config.

        Originally posted by batchenr
        it is set LogSlowQueries=3000 and i dont see any referance in logs for slow queris[/B]

        Graphs atteched.
        The Zabbix server internal process/caches indeed look good... For 1d and 7d the usage patterns remain the same?

        Do you use passive checks? If yes, then are all your 160 hosts using passive checks? What is the StartAgents setting on your Zabbix agents?
        Are you using any proxies? Are they passive or active? How are they performing (default Template App Zabbix Proxy assigned to proxy host)?

        Comment

        • batchenr
          Senior Member
          • Sep 2016
          • 440

          #5
          Hi thanks for the fast replay!

          This can be a influencing the poller process overload, if major count of checks are hitting the 30s limit. Pollers are busy awaiting the response. Is it critical for your environment? Are you able to experiment with this setting?
          we work a lot with dedicated bash scripts and some of them take time. i change it from 30 to 15 for the test.

          I mean UnreachablePeriod, UnreachableDelay and UnavailableDelay from the Zabbix server config.
          they are all commented.
          # UnavailableDelay=60
          # UnreachableDelay=15
          # UnreachablePeriod=45


          to what should i change them ?

          The Zabbix server internal process/caches indeed look good... For 1d and 7d the usage patterns remain the same?
          patterns remain the same

          Do you use passive checks?
          i have this types of checks
          Simple check 48 Zabbix agent 813 Zabbix agent (active) 140
          i gusse Zabbix agent are passive ? so yes.

          What is the StartAgents setting on your Zabbix agents?
          it is commented - what do you recommend it to be ?

          Are you using any proxies?
          nope- only zabbix server and agents

          Comment

          • ovas
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Apr 2017
            • 138

            #6
            Originally posted by batchenr
            we work a lot with dedicated bash scripts and some of them take time. i change it from 30 to 15 for the test.
            Keep an eye on the log and figure out the best values here.

            Originally posted by batchenr
            they are all commented.
            # UnavailableDelay=60
            # UnreachableDelay=15
            # UnreachablePeriod=45

            to what should i change them ?
            No changes necessary, defaults are good even for big deployments.

            Originally posted by batchenr
            i have this types of checks
            Simple check 48 Zabbix agent 813 Zabbix agent (active) 140
            i gusse Zabbix agent are passive ? so yes.

            it is commented - what do you recommend it to be ?

            nope- only zabbix server and agents
            This pretty much answers your situation. You see, since every check is performed by Zabbix Server itself, no wonder it is struggling in collecting these 83171 item values.
            The best practice is to make every check possible to be active as well as offload server by implementing proxy in active mode, so that Zabbix Server would be busy only with processing the data, not wasting resources to collecting.
            On the other hand, you have a pretty powerful machine for your average 700 NVPS, so you definitely can try to max out StartPollers setting to see, if this helps to stabilize the current situation and give you time to offload server step by step (by moving to active checks). Even if 1000 pollers will be enough for you to annihilate the queue and even out the performance, you will still lack scalability and end up with no resources to add on to your monitoring with the current setup.

            So, in a nutshell:
            1) Think about adding a proxy to offload Zabbix server in long-term.
            2) Set StartPollers=1000 and see, if your server is OK with this load. Adjust to your maximum possible value to buy yourself some time.
            3) Set Zabbix agents to active mode, reconfigure the items to be active checks.
            4) Set StartPollers to whatever will suit more, when you finish with movement to Zabbix agent active checks.

            As per the StartAgents count, it only matters if agents are in active mode and generally, each can process about 1 NVPS. Number of agents is basically number of simultaneous requests the agent may handle. If one agent's process is busy with something, it cannot process further requests. Hope this answers your question.

            Comment

            • batchenr
              Senior Member
              • Sep 2016
              • 440

              #7
              if anyone will see it-

              what eventually helped me was :

              "
              The root cause is that some discovered interfaces do not exist any more. The issue has been resolved by setting "Keep lost resources period" to be 0 and redo discovery.
              "

              go over all your zabbix template and change in the discovery item Keep lost resources period" to be 0 and boom, its down!

              Comment

              • Gknives
                Junior Member
                • Aug 2017
                • 2

                #8
                Hello batchenr! I've been stuck on the same problem for days (I'm totally new on this). Would you explain step by step what did you do to solve it? Please add some screenshots if You can, I will appreciate it so much!!

                Regards from Chile

                Comment

                • Rudlafik
                  Senior Member
                  • Nov 2018
                  • 144

                  #9
                  Hi, I have same problem on 3.4.12 on Zabbix server I try set "0" and wait on re-discovery proceses on all active template. I try set Serv/etc/zabbix/zabbix_server.conf item "StartPollers=10" than restart server. Nothing is happen.
                  Zabbix busy unreachable poller processes is still on 100%. Can you help me with this problem? I have 67 clients (27 is SNMP)?

                  Comment

                  • dimir
                    Zabbix developer
                    • Apr 2011
                    • 1080

                    #10
                    You need to increase StartPollersUnreachable, not StartPollers .

                    Comment

                    • Rudlafik
                      Senior Member
                      • Nov 2018
                      • 144

                      #11
                      dimir: Thanks for Your quick response. I try set StartPollersUnreachable=250 and restart server. Nothing happen. When I DISABLE unreachable clients in Configuration / Hosts then "Zabbix busy unreachable poller processes" drop on 27%. (All clients ZAB, SNMP,IPMI all in red collor). Setting "Keep lost resources period" to be 0 and redo discovery didnt help me in my case.

                      Now after some set "12.1 comment" my pollers is OK.
                      Attached Files
                      Last edited by Rudlafik; 18-02-2019, 09:38.

                      Comment

                      • dimir
                        Zabbix developer
                        • Apr 2011
                        • 1080

                        #12
                        Some useful thoughts were listed here: https://www.zabbix.com/forum/zabbix-...e-than-90-busy

                        Comment


                        • Rudlafik
                          Rudlafik commented
                          Editing a comment
                          Now I was not satisfied with value 27% I try Your recommendation "increase StartPollersUnreachable". I set up 80 value yesterday. And all is OK! :-)
                      • kloczek
                        Senior Member
                        • Jun 2006
                        • 1771

                        #13
                        Originally posted by batchenr
                        Hello,

                        im running a zabbix server 3.2.6 on Centos 7 with a separate database machine. server has 8 cores and 17G memmory (uses only 10)
                        now i have tried all forums and suggestions but still i see in the
                        poller processes graph stable on 100% busy.

                        i have 160 active hosts and devices.
                        and 83171 items.

                        my zabbix poller settings :

                        StartPollers=456
                        StartPollersUnreachable=100
                        .
                        Are you sure that you are using active monitoring?
                        Pollers are used for passive monitoring and agent-less monitoring (web checks, SNMP/IPMI/ODBC/telnet/ssh/etc items).
                        Just go to any host list of items (Configuration -> Hosts -> cling on the first host Items)
                        Then remove in "Host" input line that chosen host name -> in "Type" drop down list choose "Zabbix agent" and hit Apply.
                        By this you will filter all passive item on all hosts. Please share exact number shown in on the page in line with "STATUS".
                        Then you can repeat the same selection but with "Zabbix agent (active)" as Type.
                        With such big number of pollers I'm almost sure that you are still using passive monitoring
                        Last edited by kloczek; 15-02-2019, 20:54.
                        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                        https://kloczek.wordpress.com/
                        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                        My zabbix templates https://github.com/kloczek/zabbix-templates

                        Comment

                        Working...