Ad Widget

**MaxM** · 12-12-2013, 17:29

Haven't seen anything like that. Have you grabbed the 2.2 template for proxies? Did your proxy config get updated by the install package and suddenly you're back to default pollers?

**andyfry** · 12-12-2013, 22:35

Hi,

Well I have upgraded all my proxies to 2.2, switch to mysql rather than sqlite and upped the pollers from 5 to 50 and it looks like there is still a bit of a problem. I also switched to active from passive which was something I planned to do anyway.

There seems to be a discrepancy in the queues somewhere though.

The proxy queues attachment shows the 3 proxy servers. 2 are doing ok now, but the third shows nearly 300 items over 10 minutes.

The missing items attachment shows all the proxy servers have missing items over 10 minutes which conflicts with the above.

The queue graphs shows a remarkable similarity between all 3 proxy servers queues... something isn't right here.

This server is meant to be in production tomorrow so any help would be greatly appreciated.

Cheers

Andy

Attached Files

**jhenry** · 13-12-2013, 01:09

We just upgraded our install from 2.0.8 to 2.2.1 today and are experiencing this issue, too. The number of items older than 10 minutes in the proxy queue has exploded and depending on when I check sits between 20,000 and 45,000. Nothing else in the environment changed; as soon as we upgraded Zabbix, performance went off a cliff.

Additionally, the dashboard is spammed with false alerts from our "agent.ping.nodata(600)" trigger. It claims that the agents aren't polling, but when I go to Latest Data, they have data from less than 60 seconds ago. Something is seriously screwy here.

Some stats:
Master: 1
Proxies: 2
Dedicated MySQL host
Dedicated Apache PHP host
Hosts monitored: 1146
Items: 164242
VPS: 503.85 (which was being handled easily before the upgrade)
CentOS 6.4 64-bit on all hosts. Hardware is all recent generation Dell PowerEdge R620's

I double checked that our proxy and server configs are correct, they were not clobbered in the update.

**andyfry** · 13-12-2013, 02:39

Hi jhenry,

I assume you upgraded your proxies to 2.2.1 at the same time?

Are your proxies active or passive?

Cheers

Andy

**jhenry** · 13-12-2013, 06:59

Originally posted by andyfry

Hi jhenry,

I assume you upgraded your proxies to 2.2.1 at the same time?

Are your proxies active or passive?

Cheers

Andy

Thanks for the reply. Yes, the proxies are both also on 2.2.1 and they are both Active proxies. A majority of the agents have been upgraded to 2.2.1 as well.

**MaxM** · 13-12-2013, 20:50

Did you do RPM/Deb type package installs? Did your /etc/zabbix/zabbix_proxy.conf file get preserved? It is quite possible/probably you have default values for pollers/caches that are not good.

**jhenry** · 13-12-2013, 21:00

Originally posted by MaxM

Did you do RPM/Deb type package installs? Did your /etc/zabbix/zabbix_proxy.conf file get preserved? It is quite possible/probably you have default values for pollers/caches that are not good.

Yes, as I stated in my first post, I checked and we are running the same configs as before the upgrades. They did not get clobbered by the RPM upgrade.

**MaxM** · 13-12-2013, 22:06

The new 2.2 template for proxies includes a heap of data on internal process performance (essentially the same data points as an internal server). Can you map that template, gather some data, and review if anything is running hot there?

**jhenry** · 13-12-2013, 22:12

Thanks, I will check out that template.

We just disabled every host, dropped the DB on both of the proxies and rebuilt it. Slowly enabling hosts again and so far everything is running smoothly. We'll have to see if that continues once we get all hosts enabled but looking promising. Seems like maybe there is a problem in the proxy DB upgrade scripts?

**nomix** · 16-12-2013, 17:01

Time desynchronization between server and proxy

Hi guys,

I spend couple of hours to figure out about this issue cause I had the same symptoms.
Server queue empty and proxy_server queue full (over 2000) and low/idle physical resources (CPU,RAM,IOs) consumption.
Even if I significantly increase all kind of process pollers numbers.

I'm running these servers on VMWare plateform.
The server had vmwaretools intalled, and the proxy server don't. (I know, I know..)
I've observed a 20s gap between both servers.

I've intall the vmwaretools on the proxy server and in less than 10 minutes, all queues was cleared. And everything works perfectly.

I knew that it was important to keep server and proxy synchronized.. and it proove it.

Enjoy! And big up to zabbix!

**andyfry** · 16-12-2013, 22:44

Hi nomix,

I'm not sure what difference vmware tools would make but all my proxies are virtuals.

My core servers are running RHEL6.5 and my proxies are on centos 6.4 or 6.5.

Packages were installed using yum from the zabbix repo and all configuration files were preserved.

We are still seeing the issue more on one proxy than the other 2 though.

The Zabbix proxy template I'm using is the one that came with the installation. I'm not sure if this has been updated since and I should be installing a new one or something? Maybe I have my proxies configured wrongly.

When I first installed a proxy I realised that I lost sight of it in zabbix and couldn't monitor how busy it was so I created a real host and an alias as the proxy.

Proxies are all running the 2.2.1 agent and 2.2.1 proxy

Is this the right way to configure them?

Andy

**andyfry** · 16-12-2013, 23:09

I still don't understand why there is such discrepancy in the data here. Hopefully I'm doing something wrong?

Looking at the Zabbix Proxy Performance for each proxy it shows very similar data i.e. avg queue around 1200

But looking at the queue in adminsitration tab shows only one proxy with "issues" and even then there are not 1200 items in the queue.

Something ain't right here.

**jhenry** · 16-12-2013, 23:19

We've been able to stabilize things (knock on wood). Our queues and proxy performance are back down to where they were on Zabbix 2.0 The proxy internal items were helpful, they revealed that the proxy pollers were totally maxed out 100% busy at all times. We had to DRAMATICALLY increase them (currently set to 700!!!) but once we did that the queues cleaned out and the proxy pollers are only about 80% busy on average.

We still feel that there is a deeper issue here since 1) this happened immediately after updating to 2.2.1 and 2) 700 pollers is absurd. We were monitoring the same number of hosts with 50 before the upgrade. But in any case, increasing the number of pollers far past what seems reasonable did "fix" the issue.

A few other notes:

1) We had to increase the mysql max_connections variable to allow for that many pollers

2) We lowered our Timeout in zabbix_proxy.conf from 25 to 10

3) We upgraded the proxies' local database from MySQL 5.1 (the stock version with CentOS 6.4) to Percona's custom 5.6 version. Our DBA suggested this due to the huge performance improvements in 5.6. The master was already on Percona 5.5.

4) We upgraded our master DB server from 24 GB of RAM to 56.

Hopefully the MySQL changes at least give us some more headroom and we won't need to go past 700 (again, wtf) pollers for 1200 hosts.

**nomix** · 17-12-2013, 00:11

Time! Time! Time!

I strongly believe that's the root cause is a time desynchronization between the Zabbix Server and the Proxy Server.

You can check it by doing a "date" on both servers in the same time. Just a second of decay is enough to get the symptoms : high queue load and no zabbix performance saturation.

I've talk about the VMWareTools but you can also use "ntpd" to synchronize servers time on the same source. (It doesn't care if you have the "good" time, but you need to have the same time on all zabbix components.)

Even if I didn't take the time to check under the hood, I almost sure that the item's check trigger is based on time (what else?).

Your symptoms that you describe are exactly what I had until I synchronize time on both server.

Enjoy!

Ad Widget

Zabbix 2.2 proxy queue is huge!

Zabbix 2.2 proxy queue is huge!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment