Ad Widget

**tchjts1** · 02-05-2013, 15:35

What version of Zabbix are you running? What OS are you installed on?
What versions of Apache? PHP? MySql?

If you are using the Zabbix 2.x release, can you give us screenshots of the graphs that are attached to your Zabbix server? There are 2 graphs for internal items that show Zabbix processes busy %. Make the time period for the graphs like 14 days. Go to Monitoring --> Graphs -->Zabbix server. The 2 graphs I am talking about have around 10 or 12 items each.

Regarding the "failed: another network error", barring any actual network problems causing the issue, you can go into zabbix_server.conf and increase the value for Timeout=. By default that is at 3 seconds. I have changed mine to 10 and rarely see those errors anymore. You'll have to restart Zabbix server after you make that change.

**ortz** · 02-05-2013, 15:44

Thank you for the quick reply.
Zabbix version is 2.0.0
OS is RHEL 6.2
Apache (httpd) is 2.2.15
PHP is 5.3.3
MySQL is 5.1

about the graphs you asked I don't see they exist in my system (maybe because I upgraded from 1.8 to 2.0 and not fresh install?)

I increased the Timeout to 10 seconds, hope it will help a bit.

Anything else ?

Thanks again!

Originally posted by tchjts1

What version of Zabbix are you running? What OS are you installed on?
What versions of Apache? PHP? MySql?

If you are using the Zabbix 2.x release, can you give us screenshots of the graphs that are attached to your Zabbix server? There are 2 graphs for internal items that show Zabbix processes busy %. Make the time period for the graphs like 14 days. Go to Monitoring --> Graphs -->Zabbix server. The 2 graphs I am talking about have around 10 or 12 items each.

Regarding the "failed: another network error", barring any actual network problems causing the issue, you can go into zabbix_server.conf and increase the value for Timeout=. By default that is at 3 seconds. I have changed mine to 10 and rarely see those errors anymore. You'll have to restart Zabbix server after you make that change.

**tchjts1** · 02-05-2013, 16:08

I can attach that template here when I get into work, or you can get the raw XML from this link and do an import, then attach it to your Zabbix server.

http://blog.zabbix.com/wp-content/uploads/2011/03/template_zabbix_server_v3.xml

It provides very valuable information of what is happening with your Zabbix processes.

In the meantime, is your zabbix_server.log giving you any indication of what is happening?
When you say the agents are "killed", are the services still running when you experience the issue?
When you upgraded the Zabbix server, did you also upgrade your agents?

**ortz** · 02-05-2013, 16:19

Hi, I've added the template and the graphs are attached.
Please note that I have data for 5-10 minutes at the moment

Hope it helps also.

Once again, thank you!

**tchjts1** · 02-05-2013, 17:14

It is good that you got the template attached and the graphs going. That short period of time isn't going to tell the real story though, but it is going to help going forward with troubleshooting.

**tchjts1** · 02-05-2013, 17:20

These questions were in a previous reply above:

In the meantime, is your zabbix_server.log giving you any indication of what is happening?
When you say the agents are "killed", are the services still running when you experience the issue?
When you upgraded the Zabbix server, did you also upgrade your agents?

Also, if you are actually on release 2.0.0, I would definitely upgrade to the latest stable release of 2.0.6 on your Zabbix server if you can.

**ortz** · 05-05-2013, 09:09

Hi,

zabbix_server.log only saying network error trying again in 15 seconds, and after 15 seconds host unavailable.
When the agents are killed their service is dead also (agent-side).
When I upgraded Zabbix of course, I upgraded also the agents (just removed the old installation and installed from scratch).

I'll have to schedule maintenance for this, but I'll do it as fast as I can.
In the meanwhile I created a PHP script that connects to the API every 10 minutes and if it finds any agents unavailable problems it restarts the agent.
Of course this is not the way to fix the problem, but it helps for now.

I've attached the graphs for 3 days now instead of couple of hours, hope it helps finding the problem.

**tchjts1** · 05-05-2013, 22:41

Looks like their is some tweaking you can do, based on those graphs.
One thing I would do to help alleviate the "Another network error" message is to go into zabbix_server.conf and change your Timeout= vale from the default of 3 and try it at 10. This helped me tremendously when I was seeing a lot of those errors.

Another change I would make while in that config file, is to increase the value that you have for your configuration cache. While 60 isn't horrible, I try to keep mine in the 80% or above range. (This setting depends on how much memory you have available)

Can you provide a screenshot of your graphs for memory (free/used) and swap space (free/used) for your Zabbix server? A 7 day time period would be best.

Although, none of this would explain why your Zabbix agent service is being killed on your hosts. That would have zero to do with your Zabbix server setup or performance.
Agents don't just shut down en mass.

**ortz** · 06-05-2013, 07:53

Hi,

I already increased the timeout to 10 seconds.
About the memory I'm using almost all of the memory (100MB Free out of 17GB), note that a lot of this memory is for MySQL, and there is about 6GB cached memory (attached images below).

I can reduce MySQL buffer in-favor of Zabbix server if it will help...

I don't think it is a problem on Zabbix agents because it wouldn't explain why every time the agents are killed it is done simultaneously on 80-150~ hosts at the very same second...

**tchjts1** · 06-05-2013, 18:48

I'd be a little curious as to what causes your free memory to dive like that over a 10 hour period.

Anyway, for comparison purposes, here are my pertinent zabbix_server.conf sand my.cnf settings

These are configured for Zabbix App server and DB on 2 different VM's running Linux RedHat. Zabbix app server has 4 vCPU's and 8GB of memory. Zabbix DB server has 8 vCPU's and 16GB of memory.

I would still be curious to see the graphs for your swap space usage.

zabbix_server.conf

Code:

### Option: StartPollers
#       Number of pre-forked instances of pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollers=5
StartPollers=100

### Option: CacheSize
#       Size of configuration cache, in bytes.
#       Shared memory size for storing host, item and trigger data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# CacheSize=8M
CacheSize=128M

### Option: HistoryCacheSize
#       Size of history cache, in bytes.
#       Shared memory size for storing history data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# HistoryCacheSize=8M
HistoryCacheSize=128M

### Option: HistoryTextCacheSize
#       Size of text history cache, in bytes.
#       Shared memory size for storing character, text or log history data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# HistoryTextCacheSize=16M
HistoryTextCacheSize=128M

my.cnf

Code:

## Added 10/10/12
port = 3306
skip-external-locking
max_allowed_packet = 1M
table_open_cache = 512
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
#thread_concurrency = 8

# Zabbix parameters
innodb_file_per_table
max_allowed_packet = 16M
innodb_data_home_dir = /data/mysql
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /data/mysql
innodb_buffer_pool_size = 8G
innodb_additional_mem_pool_size = 32M
innodb_lock_wait_timeout = 120
innodb_log_file_size = 120M
innodb_thread_concurrency = 8
key_buffer_size = 512M
max_connections=512
table_cache=4096
query_cache_size = 128M
tmp_table_size = 8M
thread_cache_size = 64
sort_buffer_size = 16M

**ortz** · 07-05-2013, 10:46

Problem solved!

Hi,

Just a little update, I modified yesterday the StartPollers parameter (it was on the default - 5, and I changed it to 50) and everything works fine now.
My required server performance was standing on 360~ and probably it's a lot of work to do for 5 Pollers, after increasing it all queues cleared and no network errors again.

Thank you very much for helping in this issue.

Ad Widget

Agents killed

Agents killed

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment