Ad Widget

Collapse

Zabbix 2.2 proxy queue is huge!

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • andyfry
    Junior Member
    • Mar 2013
    • 10

    #16
    well what do you know!

    So one proxy was out by 6 seconds and is now in sync.

    Problem resolved.

    I wonder now whether I should drop all those unused poller processes ?

    Comment

    • jhenry
      Junior Member
      • Jul 2013
      • 11

      #17
      Unfortunately that's not it for us, everything is in sync down to the second.

      proxy1:
      Tue Dec 17 15:42:41 MST 2013
      proxy2:
      Tue Dec 17 15:42:41 MST 2013
      master:
      Tue Dec 17 15:42:41 MST 2013
      DB:
      Tue Dec 17 15:42:41 MST 2013
      web:
      Tue Dec 17 15:42:41 MST 2013

      Comment

      • nomix
        Junior Member
        • Dec 2013
        • 5

        #18
        @Andryfry : By reducing the value of the StartPollers parameter of your zabbix_server.conf and zabbix_proxy.conf. Don't forget to restart your server and proxy to take the new configuration in account.

        @jhenry : Your problem is different. You "really" have a high load of activity that required more physical ressources. I agree that it doesn't explain why the upgrade causing this. If you check the zabbix server and proxy stats can you identify the bottleneck.

        If you've increase the physical amount of RAM on you database server, did you increase the innodb_buffer_pool_size too ? (if the zabbix database is an innodb obviously.)

        If you can use "nmon" tool on you CentOS, check which component is under high load? (CPU,RAM,IOs)
        Or just with a "vmstat 2" and you'll see if you swap, if you're in IO wait or in CPU starvation.

        Keep us in touch.
        Last edited by nomix; 18-12-2013, 11:34.

        Comment

        • andyfry
          Junior Member
          • Mar 2013
          • 10

          #19
          Hi nomix,

          It wasn't a question of how to reduce the pollers, more a question of whether to. As I have been told so many times "If it ain't broke, don't fix it"

          Whilst I have plenty idle pollers I also having a working zabbix....

          Leave it alone methinks.

          Comment

          • c.mammoli
            Member
            Zabbix Certified Specialist
            • Feb 2012
            • 48

            #20
            Similar issue here:

            All proxies are mostly idle according to internal monitoring but I have hundreds of queued items in the "Queue page" (see attachment)

            The server has no queued items

            Keeping the time synched on all the proxy is an issue (different hypervisors and hardware etc...). A difference of a few seconds should be tolerated

            P.S. This behaviour didn't happen in 2.0
            Attached Files

            Comment

            • nomix
              Junior Member
              • Dec 2013
              • 5

              #21
              Ntp

              Keeping time synchronized isn't a big deal today.. NTP is quite efficient and not very complicated to setup..

              I agree with you c.mammoli that this behavior is coming from a change between 2.0 and 2.2.

              The v2.2 seems to be time synchro very sensitive..

              Comment

              • c.mammoli
                Member
                Zabbix Certified Specialist
                • Feb 2012
                • 48

                #22
                Originally posted by nomix
                Keeping time synchronized isn't a big deal today.. NTP is quite efficient and not very complicated to setup..

                I agree with you c.mammoli that this behavior is coming from a change between 2.0 and 2.2.

                The v2.2 seems to be time synchro very sensitive..
                I have ntpd running and configured on all the proxies, but the synchronization doesn't run "continuously". Since most of my proxies are virtual machines a delta of a few seconds is totally possible and not easily fixable.

                Comment

                • jmusbach
                  Member
                  • Sep 2013
                  • 37

                  #23
                  Hello, we want to upgrade from 2.0.8 to 2.2.1 as there are some bugfixes incorporated in the release that we'd benefit from. However we will hold off if these issues are continuing to plague the release. Are these still active issues with 2.2.1 or has it stabilized by now? If not, is there any ETA for stabilization of these outstanding performance problems? Thanks.

                  Comment

                  • elvar
                    Senior Member
                    • Feb 2008
                    • 226

                    #24
                    Wow, really glad I found this post and that I am not alone because I have been banging my head trying to troubleshoot why my queue has completely exploded since upgrading from 2.0.x to 2.2. According to the internal checks I'm using on both the Zabbix server and the proxy servers there are no performance bottlenecks anywhere. My postgresql database looks healthy as well. You can see where the upgrade took place and my queue exploded in the attached picture. My server and proxies are currently all running 2.2.1.

                    If anyone finds a solution to this please share.

                    Kind regards,
                    Attached Files

                    Comment

                    • elvar
                      Senior Member
                      • Feb 2008
                      • 226

                      #25
                      Well, despite most of my proxies only being off a little time sync wise, I decided to force syncs on several of them for testing as well as the server and the results were very noticeable. You can see in the attached picture how much the queue dropped once their times were completely in sync. It would seem that 2.2 is far more sensitive to time differences.

                      Kind regards,
                      Attached Files

                      Comment

                      • andyfry
                        Junior Member
                        • Mar 2013
                        • 10

                        #26
                        Hi Elvar,

                        Good to see this post was useful for you too.

                        My queues all seem a lot happier now.

                        What does concern me though is that a matter of a few seconds time difference could cause such big problems. It seems way too time sensitive don't you think?

                        Andy

                        Comment

                        • elvar
                          Senior Member
                          • Feb 2008
                          • 226

                          #27
                          Originally posted by andyfry
                          Hi Elvar,

                          Good to see this post was useful for you too.

                          My queues all seem a lot happier now.

                          What does concern me though is that a matter of a few seconds time difference could cause such big problems. It seems way too time sensitive don't you think?

                          Andy
                          I agree, it definitely seems way too sensitive.

                          Comment

                          • jmusbach
                            Member
                            • Sep 2013
                            • 37

                            #28
                            Interesting, good catch. Perhaps this deserves a bug report?

                            Comment

                            • jsribeiro
                              Junior Member
                              • Feb 2014
                              • 1

                              #29
                              We're seeing this problem after upgrading to 2.2.

                              Using ntp between servers keeps the queue cycling between 500 and 2000, lowering every hour.

                              Using ntpdate to sync clocks every 5 minutes via crontab keeps the queue graph as a sawtooth cycling between 100 and 500.

                              Is this being addressed (in 2.2.2, maybe)?

                              Regards.

                              Comment

                              • GArmao
                                Zabbix Certified Specialist
                                Zabbix Certified Trainer
                                Zabbix Certified Specialist
                                • Mar 2010
                                • 135

                                #30
                                I've seen the same issue here, just a few seconds of time de-synchronization can cause huge "reported" queue, especially if you have some items with long update interval (3600 seconds in my case) and with proxies.

                                Here's my example:

                                my detailed queue reports I have an item "mem Heap Memory max" (update interval 3600 seconds) delayed by "56m 51s", Zabbix reports it should've been checked on "05 Mar 2014 17:00:04", so let's check the real "Last check" date reported on the "latest data" for that item: "05 Mar 2014 17:00:01" (3 seconds before the expected last check).
                                So what happens is, Zabbix thinks it still needs to receive a value for that item, but actually, it's been received 3 seconds before the scheduled time, because of time synch difference.

                                Synching the proxy date exactly like the server completely fixes queue calculation.

                                Just a reminder, Zabbix queue display is really just a calculation of delays on items, based on update intervals and last check time, it's not an actual queue and there's no way to manually "clear" the queue.

                                I'm not sure what changed in 2.2 that made this queue estimation so "picky".

                                Comment

                                Working...