Ad Widget

**ingus.vilnis** · 24-03-2015, 17:35

Hi,

Not much of a help but two things that I can add here.

1. Is it possible that you are having some network or firewall issues at those times so connections from agents simply can't get through? Might be so on some smart firewalls that specific posts are blocked due to some amount of traffic.

2. Unrelated to your Active checks issue. StartDBSyncers=32 is way too much for your 682 NVPS. Each DB syncer is capable of processing ~1000 nvps so you should be safe with the default 4 here.

Best Regards,
Ingus

**gleepwurp** · 24-03-2015, 17:44

Thanks for your reply Ingus!

I too thought maybe some network contention might be the problem, but the thing that made me doubt this is the fact the that Zabbix Agent (ACTIVE) running locally on the Zabbix Server itself has the same issue when trying to connect locally to "itself" using Active...

I'll try replacing the ServerActive IP from the server's IP to 127.0.0.1 to see if that makes a difference next time... That will at least be a good indication if the IP Stack/Port is the issue, or the Zabbix Trappers.

Thanks for the DBSyncer advice, I'll drop it down to your recommended value...

Thanks!

Gleepwurp.

**ingus.vilnis** · 24-03-2015, 17:48

Hi Gleepwurp,

Yes, strange that the checks fail also on server. Try the 127.0.0.1 for sure.

Also have a look at "Zabbix data gathering process busy" graph with period when you had these issues. Maybe your trappers are overloaded and you need to add more than 100 in server.conf?

Best Regards,
Ingus

**ingus.vilnis** · 24-03-2015, 17:50

And check all the other graphs and parameters for high spikes as well.

**gleepwurp** · 24-03-2015, 22:19

Hello Ingus,

I don't have any spikes to speak of, and the CPU is idle...

I'm posting the graphs for the Zabbix Server stats below, when I had this issue:

Well, forget posting the Perf/Process graphs, seems I have a 100k quota for total picture attachment... I'll just post the Min/Max/Avg for Poller and Processes during the period where there was around ~80k items in the queue (period is about 1 day, 20 hours).

The High MAX values are usually at the end, when Zabbix seems to wake up and process all the back log (lasts about 2-3 minutes at most).

Gleepwurp.

Attached Files

**ingus.vilnis** · 25-03-2015, 17:12

Hi,

Hard to tell much from these figures. AVG 1.1% trappers is not optimal. Alerter and history syncer also could be better. CPU is not so important here. I can hardly remember a case when there were any significant CPU load at all. But that's all.

Is there still a way to see complete graphs and get the overall picture?

Best Regards,
Ingus

**gleepwurp** · 25-03-2015, 17:21

Hi Ingus,

I have historical data/graphs from the last time it happened (less than 7 days ago)... however, I can only have 100k of attachment/graphics total in all my posts throughout this site, so each time I try to post a picture, I have to remove one from my earlier post... And most of my graphs are more than 100k, so they can't be posted here...

Do you have a 3rd party image-hosting site to suggest?

Thx,

Gleepwurp.

**ingus.vilnis** · 25-03-2015, 17:24

Hi,

I have never used such hosting before so I cannot suggest you any good so you can search for some yourself. Or maybe share a Dropbox link.

Best Regards,
Ingus

**gleepwurp** · 25-03-2015, 17:39

Ok,

found a place to post the graphs...

Here it is: http://postimg.org/gallery/3if7qwxs/cbf1b4ca/

Let me know if you need more graphs...

Thank for you help!

Gleepwurp.

**tchjts1** · 26-03-2015, 10:56

My .02 cents here:

Even though your NVPS is not astronomical, you are working with many hosts at ~ 4,000 collecting mostly VMWare data. Also looking at your graphs (if they were for my setup) I would want to adjust the cache settings so they are a bit more efficient. Of course, this all depends on whether you have available resources to allocate for these.

Anyway, here are your settings as you show above, and my comments on which ones I would increment. These are simply suggestions. Any changes require a restart of Zabbix server process.

I am unsure about this one as I am going from memory at the moment, but it may also help to add the parameter UnreachablePeriod=120 to your Zabbix server.conf file. I am not sure what it is by default. Maybe 60.

Code:

DebugLevel=3
StartPollers=80
StartPollersUnreachable=40
StartTrappers=100
StartPingers=20
StartDiscoverers=10
CacheSize=512M            <---- Increment to 1G
CacheUpdateFrequency=300
StartDBSyncers=32         <---- As Ingus mentioned, put this back to 4
HistoryCacheSize=256M
TrendCacheSize=128M
Timeout=20                   <---- I would put this to 30
ProxyConfigFrequency=300
StartVMwareCollectors=20  <---- Increment to maybe 40
VMwareFrequency=300
VMwarePerfFrequency=300
VMwareTimeout=30
VMwareCacheSize=512M
ValueCacheSize=512M

I am happy when my graphs are looking like this

Just a moment...

https://www.zabbix.com/forum/showthread.php?t=47781

.

**gleepwurp** · 26-03-2015, 14:54

Hi,

Thanks both you (tchjts1 and Ingus) for the suggestions, I will give them a try!

A Linux knowledgeable colleague of mine suggested that maybe I'm running out of sockets.... The current ulimit for the Zabbix user (open files) is set at 1024... Have any of you ever required to increase this limit for Zabbix?

Thanks again for all your insights!

Gleepwurp.

**tchjts1** · 26-03-2015, 15:21

I will need to check my setup when I get into work and see exactly what I have set.

In the meantime, what version of Zabbix are you running on Zabbix Server?
Do your proxies match this same version?

**gleepwurp** · 26-03-2015, 15:26

I've made the suggested changes and I'll follow up with new stat graphs tomorrow to see how it impacts...

I'm running 2.4.4 on both the Server and the Proxies (I upgrade them every month)...

G.

**tchjts1** · 26-03-2015, 16:09

Originally posted by gleepwurp

(I upgrade them every month)...

G.

You like living on the edge, eh?

Just my personal rule of thumb with upgrading Zabbix - I never implement a new major version of Zabbix until it is at least on the x.x.5 release. That gives them time to work out the majority of issues.

And I don't upgrade unless there is a need for it (new features/security/functionality that I want). I am still on version 2.0.9 lol.

Ad Widget

Zabbix_Server periodically stops accepting Active connections

Zabbix_Server periodically stops accepting Active connections

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment