Ad Widget

**andyfry** · 18-12-2013, 00:26

well what do you know!

So one proxy was out by 6 seconds and is now in sync.

Problem resolved.

I wonder now whether I should drop all those unused poller processes ?

**jhenry** · 18-12-2013, 00:44

Unfortunately that's not it for us, everything is in sync down to the second.

proxy1:
Tue Dec 17 15:42:41 MST 2013
proxy2:
Tue Dec 17 15:42:41 MST 2013
master:
Tue Dec 17 15:42:41 MST 2013
DB:
Tue Dec 17 15:42:41 MST 2013
web:
Tue Dec 17 15:42:41 MST 2013

**nomix** · 18-12-2013, 10:48

@Andryfry : By reducing the value of the StartPollers parameter of your zabbix_server.conf and zabbix_proxy.conf. Don't forget to restart your server and proxy to take the new configuration in account.

@jhenry : Your problem is different. You "really" have a high load of activity that required more physical ressources. I agree that it doesn't explain why the upgrade causing this. If you check the zabbix server and proxy stats can you identify the bottleneck.

If you've increase the physical amount of RAM on you database server, did you increase the innodb_buffer_pool_size too ? (if the zabbix database is an innodb obviously.)

If you can use "nmon" tool on you CentOS, check which component is under high load? (CPU,RAM,IOs)
Or just with a "vmstat 2" and you'll see if you swap, if you're in IO wait or in CPU starvation.

Keep us in touch.

**andyfry** · 18-12-2013, 23:44

Hi nomix,

It wasn't a question of how to reduce the pollers, more a question of whether to. As I have been told so many times "If it ain't broke, don't fix it"

Whilst I have plenty idle pollers I also having a working zabbix....

Leave it alone methinks.

**c.mammoli** · 27-12-2013, 15:45

Similar issue here:

All proxies are mostly idle according to internal monitoring but I have hundreds of queued items in the "Queue page" (see attachment)

The server has no queued items

Keeping the time synched on all the proxy is an issue (different hypervisors and hardware etc...). A difference of a few seconds should be tolerated

P.S. This behaviour didn't happen in 2.0

Attached Files

**nomix** · 27-12-2013, 16:04

Ntp

Keeping time synchronized isn't a big deal today.. NTP is quite efficient and not very complicated to setup..

I agree with you c.mammoli that this behavior is coming from a change between 2.0 and 2.2.

The v2.2 seems to be time synchro very sensitive..

**c.mammoli** · 27-12-2013, 16:08

Originally posted by nomix

Keeping time synchronized isn't a big deal today.. NTP is quite efficient and not very complicated to setup..

I agree with you c.mammoli that this behavior is coming from a change between 2.0 and 2.2.

The v2.2 seems to be time synchro very sensitive..

I have ntpd running and configured on all the proxies, but the synchronization doesn't run "continuously". Since most of my proxies are virtual machines a delta of a few seconds is totally possible and not easily fixable.

**jmusbach** · 30-12-2013, 23:28

Hello, we want to upgrade from 2.0.8 to 2.2.1 as there are some bugfixes incorporated in the release that we'd benefit from. However we will hold off if these issues are continuing to plague the release. Are these still active issues with 2.2.1 or has it stabilized by now? If not, is there any ETA for stabilization of these outstanding performance problems? Thanks.

**elvar** · 07-01-2014, 18:35

Wow, really glad I found this post and that I am not alone because I have been banging my head trying to troubleshoot why my queue has completely exploded since upgrading from 2.0.x to 2.2. According to the internal checks I'm using on both the Zabbix server and the proxy servers there are no performance bottlenecks anywhere. My postgresql database looks healthy as well. You can see where the upgrade took place and my queue exploded in the attached picture. My server and proxies are currently all running 2.2.1.

If anyone finds a solution to this please share.

Kind regards,

Attached Files

**elvar** · 07-01-2014, 19:37

Well, despite most of my proxies only being off a little time sync wise, I decided to force syncs on several of them for testing as well as the server and the results were very noticeable. You can see in the attached picture how much the queue dropped once their times were completely in sync. It would seem that 2.2 is far more sensitive to time differences.

Kind regards,

Attached Files

**andyfry** · 07-01-2014, 22:53

Hi Elvar,

Good to see this post was useful for you too.

My queues all seem a lot happier now.

What does concern me though is that a matter of a few seconds time difference could cause such big problems. It seems way too time sensitive don't you think?

Andy

**elvar** · 07-01-2014, 23:18

Originally posted by andyfry

Hi Elvar,

Good to see this post was useful for you too.

My queues all seem a lot happier now.

What does concern me though is that a matter of a few seconds time difference could cause such big problems. It seems way too time sensitive don't you think?

Andy

I agree, it definitely seems way too sensitive.

**jmusbach** · 09-01-2014, 22:00

Interesting, good catch. Perhaps this deserves a bug report?

**jsribeiro** · 04-02-2014, 12:13

We're seeing this problem after upgrading to 2.2.

Using ntp between servers keeps the queue cycling between 500 and 2000, lowering every hour.

Using ntpdate to sync clocks every 5 minutes via crontab keeps the queue graph as a sawtooth cycling between 100 and 500.

Is this being addressed (in 2.2.2, maybe)?

Regards.

**GArmao** · 05-03-2014, 20:02

I've seen the same issue here, just a few seconds of time de-synchronization can cause huge "reported" queue, especially if you have some items with long update interval (3600 seconds in my case) and with proxies.

Here's my example:

my detailed queue reports I have an item "mem Heap Memory max" (update interval 3600 seconds) delayed by "56m 51s", Zabbix reports it should've been checked on "05 Mar 2014 17:00:04", so let's check the real "Last check" date reported on the "latest data" for that item: "05 Mar 2014 17:00:01" (3 seconds before the expected last check).
So what happens is, Zabbix thinks it still needs to receive a value for that item, but actually, it's been received 3 seconds before the scheduled time, because of time synch difference.

Synching the proxy date exactly like the server completely fixes queue calculation.

Just a reminder, Zabbix queue display is really just a calculation of delays on items, based on update intervals and last check time, it's not an actual queue and there's no way to manually "clear" the queue.

I'm not sure what changed in 2.2 that made this queue estimation so "picky".

Ad Widget

Zabbix 2.2 proxy queue is huge!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment