Ad Widget

Collapse

High Queue numbers

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • olegus
    Member
    • Dec 2023
    • 68

    #1

    High Queue numbers

    We have ~2k hosts monitored through 10 or so proxies. Those are passive host connections and active proxies.
    After last large batch of hosts we saw our Queue values raised dramatically - we have >20k there.
    Click image for larger version

Name:	image.png
Views:	1650
Size:	36.0 KB
ID:	488327
    Click image for larger version

Name:	image.png
Views:	1662
Size:	9.6 KB
ID:	488328
    First of all I'd like to understand what data this Queue represents?
    - is it data that cannot be collected because proxy-hosts connection is down or limited?
    -is it already collected data from hosts that cannot be sent from proxy to a zabbix server ?
    -is it data that already came from proxies but zabbix server cannot process it or store it to DB?

    Most of this data is located in 1 minute area (what does it mean exactly btw?) Is it OK to have it this way or I need to do something ?
    If so , what should I do to mitigate/ get rid of queue?

    Good links to reading materials are welcome But if you can talk from your experience - that's even better.

    Thanks,
    Oleg
    Attached Files
  • vsergione
    Junior Member
    • Oct 2023
    • 28

    #2
    I suggest you check this post, maybe it answers your questions: https://www.zabbix.com/forum/zabbix-...server-problem

    Comment

    • olegus
      Member
      • Dec 2023
      • 68

      #3
      Originally posted by vsergione
      I suggest you check this post, maybe it answers your questions: https://www.zabbix.com/forum/zabbix-...server-problem
      Thanks,
      I've read it through, the suggestion there in 2018 was to convert passive checks to active. I really hesitate to do it as we just made our setup which required collaboration with multiple product teams, so I'm trying to find another option - scale up/out proxies may be?
      As far as I understand the process, this high queue data indicates performance issues on proxies, right?

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #4
        To your last question, yes, you said the items are passive so they are polled by the proxies.

        Your first picture shows that the increased queue is not on all proxies. Compare the poller configurations and utilizations between the proxies.

        Note that Zabbix 7.0 introduced async pollers (https://www.zabbix.com/documentation...ronous-pollers) to add polling performance significantly so you may also want to experiment with 7.0.2 when it's out.

        Markku

        Comment

        • olegus
          Member
          • Dec 2023
          • 68

          #5
          Originally posted by Markku
          To your last question, yes, you said the items are passive so they are polled by the proxies.

          Your first picture shows that the increased queue is not on all proxies. Compare the poller configurations and utilizations between the proxies.

          Note that Zabbix 7.0 introduced async pollers (https://www.zabbix.com/documentation...ronous-pollers) to add polling performance significantly so you may also want to experiment with 7.0.2 when it's out.

          Markku
          Thanks Markku
          what is the defaults for number of running pollers?
          and is there any recommendations how many should I start regarding to host number?

          Thanks,
          Oleg

          Comment

          • olegus
            Member
            • Dec 2023
            • 68

            #6
            Also , numbers in the red zone - can it be that some hosts are offline, that's why we see it? What else can red zone represent? 10 min it is a veeeeery long period.

            Comment

            • Markku
              Senior Member
              Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
              • Sep 2018
              • 1781

              #7
              You can see the defaults in the configuration file (unless the config file is from some old version with old comments) and in the documentation (https://www.zabbix.com/documentation...g/zabbix_proxy).

              The rule is that whenever your pollers cannot keep up with the work anymore, you need to increase the number of pollers or optimize the work (by using active agents). You need to monitor the utilization of the pollers, use the official templates for Zabbix proxy and server.

              But again, if your implementation is poller-heavy, you should be interested in Zabbix 7.0 (7.0.2 just came out today).

              Markku

              Comment

              • Markku
                Senior Member
                Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                • Sep 2018
                • 1781

                #8
                Originally posted by olegus
                Also , numbers in the red zone - can it be that some hosts are offline, that's why we see it? What else can red zone represent? 10 min it is a veeeeery long period.
                Absolutely, at least with active items that's the case.

                Markku

                Comment

                • olegus
                  Member
                  • Dec 2023
                  • 68

                  #9
                  We did some tuning with environment variables and we have partial success - for the busiest proxy (all passive connections) we added these variables:
                  ZBX_STARTPOLLERS=200
                  ZBX_STARTPOLLERSUNREACHABLE=50
                  ZBX_STARTPINGERS=40
                  and it brought queue down to 0.

                  But-
                  we tried to repeat this on another proxy , that we did not realized had all active hosts and it made things much worse after restart. Even after we changed yaml back, it stays like this:

                  Click image for larger version

Name:	image.png
Views:	1649
Size:	9.6 KB
ID:	488669
                  This is a proxy with active connections. What can be wrong here?

                  Comment

                  • Markku
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                    • Sep 2018
                    • 1781

                    #10
                    What does queue details say?

                    Markku

                    Comment

                    • olegus
                      Member
                      • Dec 2023
                      • 68

                      #11
                      It lists a lot of (~7k) items from hosts behind that proxy.

                      Comment

                      • Markku
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                        • Sep 2018
                        • 1781

                        #12
                        Let me be more specific: You asked, "what can be wrong here", I asked you to check the queue details: is there anything that seems logical in the queue details for that proxy with active items? Are the hosts down or otherwise unable to send the data to the proxy, for example?

                        Markku

                        Comment

                        • olegus
                          Member
                          • Dec 2023
                          • 68

                          #13
                          I got more details on this proxy - there are ~1k hosts behind this proxy, most of them are in active mode. Part of them are Linux machines, part are Windows systems. They restarted the proxy and after restart zabbix agents on Windows machines struggled to connect back to proxy. So it *might* explain hosts in reds.
                          But the question still actual - if we see big queue numbers on active connections, what should we tune and where (proxy/host) to improve it.

                          Comment

                          • Markku
                            Senior Member
                            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                            • Sep 2018
                            • 1781

                            #14
                            As far as I can tell, increased queues with active items mean that the agents have not been able to get their data through to the Zabbix server, either because the agents are not connecting properly to the proxy, or because the proxy is not able to send the data to the server properly. Basically you should check and fix connectivity between each of the mentioned components, as well as check the metrics again on the proxies (their trapper processes can be overwhelmed as well if too many agents are connecting to them and thus need increasing the number of trappers).

                            So, on the proxies:
                            - poller processes take care of the passive items (= they connect to the agents)
                            - trapper processes take care of the active items (= they receive the connections from the agents)
                            and both are subject to resource starvation if there is too much work (and that's when increasing the pollers/trappers help, to a certain limit).

                            Markku

                            Comment

                            • olegus
                              Member
                              • Dec 2023
                              • 68

                              #15
                              Thanks Markku,
                              will check trappers next.

                              Comment

                              Working...