Ad Widget

Collapse

Zabbix 2.2 proxy queue is huge!

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • andyfry
    Junior Member
    • Mar 2013
    • 10

    #1

    Zabbix 2.2 proxy queue is huge!

    Hi,

    I've recently migrated from zabbix 2.0 to zabbix 2.2 running on a much more powerful system with faster disk.

    Everything looked good apart from missing data from my Passive proxies. I assumed this was because I hadn't upgraded those so I updated one to 2.2.

    I still had over 1000 items in the queue from that proxy in over 10 minutes column. So I reconfigured so that the proxy only had one host to monitor. It now has 1 host connected with 51 items and a required vps of 0.74. The proxy server hasn't changed and should be able to cope with way more than this. (It was previously coping with up to 120vps).

    I feel like I'm missing something really dumb.

    Does anybody have any similar experiences with the upgrade from 2.0 to 2.2?

    Help!

    Andy
  • MaxM
    Member
    • Sep 2011
    • 42

    #2
    Haven't seen anything like that. Have you grabbed the 2.2 template for proxies? Did your proxy config get updated by the install package and suddenly you're back to default pollers?

    Comment

    • andyfry
      Junior Member
      • Mar 2013
      • 10

      #3
      Hi,

      Well I have upgraded all my proxies to 2.2, switch to mysql rather than sqlite and upped the pollers from 5 to 50 and it looks like there is still a bit of a problem. I also switched to active from passive which was something I planned to do anyway.

      There seems to be a discrepancy in the queues somewhere though.

      The proxy queues attachment shows the 3 proxy servers. 2 are doing ok now, but the third shows nearly 300 items over 10 minutes.

      The missing items attachment shows all the proxy servers have missing items over 10 minutes which conflicts with the above.

      The queue graphs shows a remarkable similarity between all 3 proxy servers queues... something isn't right here.

      This server is meant to be in production tomorrow so any help would be greatly appreciated.

      Cheers

      Andy
      Attached Files

      Comment

      • jhenry
        Junior Member
        • Jul 2013
        • 11

        #4
        We just upgraded our install from 2.0.8 to 2.2.1 today and are experiencing this issue, too. The number of items older than 10 minutes in the proxy queue has exploded and depending on when I check sits between 20,000 and 45,000. Nothing else in the environment changed; as soon as we upgraded Zabbix, performance went off a cliff.

        Additionally, the dashboard is spammed with false alerts from our "agent.ping.nodata(600)" trigger. It claims that the agents aren't polling, but when I go to Latest Data, they have data from less than 60 seconds ago. Something is seriously screwy here.

        Some stats:
        Master: 1
        Proxies: 2
        Dedicated MySQL host
        Dedicated Apache PHP host
        Hosts monitored: 1146
        Items: 164242
        VPS: 503.85 (which was being handled easily before the upgrade)
        CentOS 6.4 64-bit on all hosts. Hardware is all recent generation Dell PowerEdge R620's

        I double checked that our proxy and server configs are correct, they were not clobbered in the update.

        Comment

        • andyfry
          Junior Member
          • Mar 2013
          • 10

          #5
          Hi jhenry,

          I assume you upgraded your proxies to 2.2.1 at the same time?

          Are your proxies active or passive?

          Cheers

          Andy

          Comment

          • jhenry
            Junior Member
            • Jul 2013
            • 11

            #6
            Originally posted by andyfry
            Hi jhenry,

            I assume you upgraded your proxies to 2.2.1 at the same time?

            Are your proxies active or passive?

            Cheers

            Andy
            Thanks for the reply. Yes, the proxies are both also on 2.2.1 and they are both Active proxies. A majority of the agents have been upgraded to 2.2.1 as well.

            Comment

            • MaxM
              Member
              • Sep 2011
              • 42

              #7
              Did you do RPM/Deb type package installs? Did your /etc/zabbix/zabbix_proxy.conf file get preserved? It is quite possible/probably you have default values for pollers/caches that are not good.

              Comment

              • jhenry
                Junior Member
                • Jul 2013
                • 11

                #8
                Originally posted by MaxM
                Did you do RPM/Deb type package installs? Did your /etc/zabbix/zabbix_proxy.conf file get preserved? It is quite possible/probably you have default values for pollers/caches that are not good.
                Yes, as I stated in my first post, I checked and we are running the same configs as before the upgrades. They did not get clobbered by the RPM upgrade.

                Comment

                • MaxM
                  Member
                  • Sep 2011
                  • 42

                  #9
                  The new 2.2 template for proxies includes a heap of data on internal process performance (essentially the same data points as an internal server). Can you map that template, gather some data, and review if anything is running hot there?

                  Comment

                  • jhenry
                    Junior Member
                    • Jul 2013
                    • 11

                    #10
                    Thanks, I will check out that template.

                    We just disabled every host, dropped the DB on both of the proxies and rebuilt it. Slowly enabling hosts again and so far everything is running smoothly. We'll have to see if that continues once we get all hosts enabled but looking promising. Seems like maybe there is a problem in the proxy DB upgrade scripts?

                    Comment

                    • nomix
                      Junior Member
                      • Dec 2013
                      • 5

                      #11
                      Time desynchronization between server and proxy

                      Hi guys,

                      I spend couple of hours to figure out about this issue cause I had the same symptoms.
                      Server queue empty and proxy_server queue full (over 2000) and low/idle physical resources (CPU,RAM,IOs) consumption.
                      Even if I significantly increase all kind of process pollers numbers.

                      I'm running these servers on VMWare plateform.
                      The server had vmwaretools intalled, and the proxy server don't. (I know, I know..)
                      I've observed a 20s gap between both servers.

                      I've intall the vmwaretools on the proxy server and in less than 10 minutes, all queues was cleared. And everything works perfectly.

                      I knew that it was important to keep server and proxy synchronized.. and it proove it.

                      Enjoy! And big up to zabbix!

                      Comment

                      • andyfry
                        Junior Member
                        • Mar 2013
                        • 10

                        #12
                        Hi nomix,

                        I'm not sure what difference vmware tools would make but all my proxies are virtuals.

                        My core servers are running RHEL6.5 and my proxies are on centos 6.4 or 6.5.

                        Packages were installed using yum from the zabbix repo and all configuration files were preserved.

                        We are still seeing the issue more on one proxy than the other 2 though.

                        The Zabbix proxy template I'm using is the one that came with the installation. I'm not sure if this has been updated since and I should be installing a new one or something? Maybe I have my proxies configured wrongly.

                        When I first installed a proxy I realised that I lost sight of it in zabbix and couldn't monitor how busy it was so I created a real host and an alias as the proxy.

                        Proxies are all running the 2.2.1 agent and 2.2.1 proxy

                        Is this the right way to configure them?

                        Andy

                        Comment

                        • andyfry
                          Junior Member
                          • Mar 2013
                          • 10

                          #13
                          I still don't understand why there is such discrepancy in the data here. Hopefully I'm doing something wrong?

                          Looking at the Zabbix Proxy Performance for each proxy it shows very similar data i.e. avg queue around 1200

                          But looking at the queue in adminsitration tab shows only one proxy with "issues" and even then there are not 1200 items in the queue.

                          Something ain't right here.

                          Comment

                          • jhenry
                            Junior Member
                            • Jul 2013
                            • 11

                            #14
                            We've been able to stabilize things (knock on wood). Our queues and proxy performance are back down to where they were on Zabbix 2.0 The proxy internal items were helpful, they revealed that the proxy pollers were totally maxed out 100% busy at all times. We had to DRAMATICALLY increase them (currently set to 700!!!) but once we did that the queues cleaned out and the proxy pollers are only about 80% busy on average.

                            We still feel that there is a deeper issue here since 1) this happened immediately after updating to 2.2.1 and 2) 700 pollers is absurd. We were monitoring the same number of hosts with 50 before the upgrade. But in any case, increasing the number of pollers far past what seems reasonable did "fix" the issue.

                            A few other notes:

                            1) We had to increase the mysql max_connections variable to allow for that many pollers

                            2) We lowered our Timeout in zabbix_proxy.conf from 25 to 10

                            3) We upgraded the proxies' local database from MySQL 5.1 (the stock version with CentOS 6.4) to Percona's custom 5.6 version. Our DBA suggested this due to the huge performance improvements in 5.6. The master was already on Percona 5.5.

                            4) We upgraded our master DB server from 24 GB of RAM to 56.

                            Hopefully the MySQL changes at least give us some more headroom and we won't need to go past 700 (again, wtf) pollers for 1200 hosts.
                            Last edited by jhenry; 16-12-2013, 23:23.

                            Comment

                            • nomix
                              Junior Member
                              • Dec 2013
                              • 5

                              #15
                              Time! Time! Time!

                              I strongly believe that's the root cause is a time desynchronization between the Zabbix Server and the Proxy Server.

                              You can check it by doing a "date" on both servers in the same time. Just a second of decay is enough to get the symptoms : high queue load and no zabbix performance saturation.

                              I've talk about the VMWareTools but you can also use "ntpd" to synchronize servers time on the same source. (It doesn't care if you have the "good" time, but you need to have the same time on all zabbix components.)

                              Even if I didn't take the time to check under the hood, I almost sure that the item's check trigger is based on time (what else?).

                              Your symptoms that you describe are exactly what I had until I synchronize time on both server.

                              Enjoy!
                              Last edited by nomix; 17-12-2013, 10:47.

                              Comment

                              Working...