Ad Widget

**kloczek** · 27-03-2016, 17:05

Seems like you have performance bottleneck in storage used by zabbix server database backend.

**fire555** · 27-03-2016, 22:58

A little more info.

The pollers carry on without interruption. It is only the trappers that stop writing new values to the database.

I cannot believe this is a storage bottleneck. This same environment barely worked handling data from 500 monitored hosts using version 1.8.5. And if it is the case, why would the pollers continue to save data?

The problem also does not appear to be linked with the housekeeper running. Last night the trappers stopped at 126am. The housekeeper was running at 14 minutes past the hour.

**kloczek** · 28-03-2016, 17:20

Originally posted by fire555

I cannot believe this is a storage bottlenec.

Engineering it is not something which should be taken in believing categories.
Housekeeping adds more read and write IOs when is running.
Do you have you zabbix DB monitoring? If not just login on the host where is running your DB backend and at least check iostat/sar output.
Do you know how many read and write IO/s is doing storage used by DB?

**mdiorio** · 28-03-2016, 17:48

I'm just getting up and running with Zabbix now, and I'm not seeing any data from Trappers getting in either. Pulling data in from Elasticsearch server. Any agent data is returning properly, but not trapper data.

Code:

From the agent side, I'm seeing:
**2016-03-28 11:21:22,214 DEBUG zbxsender send_to_zabbix:58 Got response from Zabbix: {u'info': u'processed: 0; failed: 4; total: 4; seconds spent: 0.000018', u'response': u'success'}
 2016-03-28 11:21:22,214 INFO zbxsender send_to_zabbix:59 processed: 0; failed: 4; total: 4; seconds spent: 0.000018

Yet I'm seeing values in the server logs - and you're right, debug level 4 is insane, can't imagine what 5 is like:

Code:

  9185:20160328:112635.725 __zbx_zbx_setproctitle() title:'trapper #4 [processing data]'
  9185:20160328:112635.725 trapper got '{
        "request":"sender data",
        "data":[
                {
                        "host":"ghq-1delasticnode01.globalspec.net",
                        "key":"health[initializing_shards]",
                        "value":0,
                        "clock":1459178782.36},
                {
                        "host":"ghq-1delasticnode01.globalspec.net",
                        "key":"health[relocating_shards]",
                        "value":0,
                        "clock":1459178782.36},
                {
                        "host":"ghq-1delasticnode01.globalspec.net",
                        "key":"health[unassigned_shards]",
                        "value":62,
                        "clock":1459178782.36},
                {
                        "host":"ghq-1delasticnode01.globalspec.net",
                        "key":"health[delayed_unassigned_shards]",
                        "value":0,
                        "clock":1459178782.36}]
}'

But my host items do not get any data. I doubt my issue is a bottleneck too. I am only monitoring two hosts and about 10 web scenarios, one strictly Zabbix Agent on Windows, and this host. I'm using SQLite DB right now, but with this minimal number of hosts/items, it should be working just fine.

**fire555** · 29-03-2016, 12:55

Originally posted by kloczek

Engineering it is not something which should be taken in believing categories.
Housekeeping adds more read and write IOs when is running.
Do you have you zabbix DB monitoring? If not just login on the host where is running your DB backend and at least check iostat/sar output.
Do you know how many read and write IO/s is doing storage used by DB?

Sorry, that was a poor choice of words. I should have said all the logging I have suggests that the database is barely doing anything. Write IOPS sit at about 40/sec, reads maybe 10. The database is actually an RDS instance in AWS with SSD backed storage. IOPS can easily surge to 3000. CPU Usage is barely 2%.

After looking a bit closer at this, I believe the trappers are no longer even bound to the TCP stack. Looking back at a few older logs I found these lines exactly when the trappers stopped processing data.

Code:

Cannot get socket IP address: [107] Transport endpoint is not connected

This line appears numerous times for each trapper PID.

Any ideas?

**kloczek** · 29-03-2016, 14:07

Originally posted by mdiorio

But my host items do not get any data. I doubt my issue is a bottleneck too. I am only monitoring two hosts and about 10 web scenarios, one strictly Zabbix Agent on Windows, and this host. I'm using SQLite DB right now, but with this minimal number of hosts/items, it should be working just fine.

Sqlite stores all database tables in single file and every update/insert/delete is causing that whole DB file is rewritten on each SQL statement.
If something is deleting something from DB nothing at the same time can even do select.
In other words sqlite as DB backend does not provide any concurrency.

You need to switch to for example MySQL.

Sqlite in context of zabbix is good only on some relatively small proxies or on some embedded systems with such proxy. On proxies housekeeping overhead is way smaller than on server.
Proxy db backend only stores data collected from agents and time to time is doing few queries to read some batch of data to sent it to server.
On proxy almost there is no selects and/or selects preasure on DB backend on proxy is only a fraction of this which is necessary to guarantee on server.

**mdiorio** · 29-03-2016, 17:25

I made a booboo - I am using MySQL database for Zabbix. I was originally using SQLite and reloaded with MySQL.

I'm not seeing any network events in the log unlike fire555. I'm only seeing the zbx_send_reponse with a failed in it for all trapper data received. But you can see the keys and values are returning valid results. It's the trapper that's not processing the data, even though it's sending a response of success.

Code:

3448:20160329:111444.433 __zbx_zbx_setproctitle() title:'trapper #4 [processing data]'
  3448:20160329:111444.433 trapper got '{
	"request":"sender data",
	"data":[
		{
			"host":"ghq-1delasticnode01.globalspec.net",
			"key":"health[active_shards]",
			"value":63,
			"clock":1459264491.56},
		{
			"host":"ghq-1delasticnode01.globalspec.net",
			"key":"health[active_primary_shards]",
			"value":63,
			"clock":1459264491.56},
		{
			"host":"ghq-1delasticnode01.globalspec.net",
			"key":"health[number_of_nodes]",
			"value":1,
			"clock":1459264491.56},
		{
			"host":"ghq-1delasticnode01.globalspec.net",
			"key":"health[number_of_data_nodes]",
			"value":1,
			"clock":1459264491.56},
		{
			"host":"ghq-1delasticnode01.globalspec.net",
			"key":"clusterstats[indices.count]",
			"value":59,
			"clock":1459264491.56},
		{
			"host":"ghq-1delasticnode01.globalspec.net",
			"key":"clusterstats[indices.store.size_in_bytes]",
			"value":231743576191,
			"clock":1459264491.56}]
}'
  3448:20160329:111444.433 In recv_agenthistory()
  3448:20160329:111444.433 In process_hist_data()
  3448:20160329:111444.433 End of process_hist_data():SUCCEED
  3448:20160329:111444.433 In zbx_send_response()
  [B][COLOR="red"]3448:20160329:111444.433 zbx_send_response() '{"response":"success","info":"processed: 0; failed: 6; total: 6; seconds spent: 0.000022"}'[/COLOR][/B]  
  3448:20160329:111444.433 End of zbx_send_response():SUCCEED
  3448:20160329:111444.433 End of recv_agenthistory()
  3448:20160329:111444.433 __zbx_zbx_setproctitle() title:'trapper #4 [processed data in 0.000384 sec, waiting for connection]'
  3442:20160329:111444.458 get value from agent result: '0'
  3442:20160329:111444.458 End of get_value_agent():SUCCEED
  3442:20160329:111444.458 End of get_value():SUCCEED

Ad Widget

Zabbix 3.0.1 trappers stop processing new data

Zabbix 3.0.1 trappers stop processing new data

Comment

Comment

Comment

Comment

Comment

Comment

Comment