ZABBIX Forums  
  #1  
Old 04-12-2017, 18:52
Rodney Rodney is offline
Junior Member
 
Join Date: Jun 2008
Posts: 14
Default Zabbix Performance Problem

Hi,

Iīm having huge performance problem with Zabbix Server since my NVPS has achieved the value of 3.500.

However the problem starts only at midnight, sometimes 21 h, and fineshes 1 or 2 hours later. Sometimes it lasts more too.

During this time, the items arenīt updated and some data is lost. The DBsyncers process stays very slow and
the CPU I/O wait increase very much in the DB Server. The Zabbix Dashboard doenīt work too.

At the DB server we can see a lot of queries like this one:

select clock,ns,value from history_uint where itemid=<id> and clock><end> and clock<=<begin>

During the day, the CPU utilization of the DB server is very low and everything works fine.

Has anyone any idea what is happening at this hour and what can be done do solve this problem?

Do you think that the problem is related with the DB Server or the Zabbix Server?

I think that it is related with the dbsyncer processes, but Iīm not sure.

As my environment is very big, the debug level in the Zabbix Server is set to 2 (Error) and I couldnīt see anything.

Nothing on the /var/log/messagestoo, on the DB Server.

Thanks,
Rodney



My configuration is:

- Zabbix Server version: 3.2.10

- Zabbix Server host: Virtual Machine running CentOS 7.3.1611 64 bits, 4 vCPs, 16 GB RAM

- Data Base: mysql Ver 15.1 Distrib 10.1.18-MariaDB, for Linux (x86_64) using readline 5.1. At the moment Iīm with just one node.

- Data Base host: Physical Host running CentOS 7.4.1708 (Core), 32 CPUs, 87 GB RAM

- Several proxies version 3.2.10, on VMs CentOS 7.4.

- Partition tables enable on history and trend tables.

Zabbix Serve Configuration (main parameters changed):

StartTrappers=100
MaxHousekeeperDelete=100000
CacheSize=1536M
CacheUpdateFrequency=300
StartDBSyncers=8 (should I increase it?)
HistoryCacheSize=512M (during the problem, it goes to zero)
TrendCacheSize=256M
ValueCacheSize=1G
Timeout=4
LogSlowQueries=10000

MariaDB Configuration:

open_files_limit = 16364
max_connections = 500
binlog_format=ROW
default_storage_engine=InnoDB

innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_additional_mem_pool_size = 16M
innodb_autoextend_increment = 256
innodb_buffer_pool_instances = 32
innodb_buffer_pool_size = 32G
innodb_change_buffer_max_size = 50
innodb_concurrency_tickets = 5000
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
innodb_log_files_in_group = 4
innodb_old_blocks_time = 1000
innodb_open_files = 2048
innodb_stats_on_metadata = OFF
innodb_lock_wait_timeout = 50
innodb_io_capacity = 2000

large-pages
binlog-row-event-max-size = 8192
character_set_server = utf8
collation_server = utf8_bin
expire_logs_days = 1
join_buffer_size = 262144
max_allowed_packet = 32M
max_connect_errors = 10000
max_heap_table_size = 134217728
query_cache_type = 0
query_cache_size = 0
slow-query-log = ON
table_open_cache = 2048
thread_cache_size = 64
tmp_table_size = 134217728
wait_timeout = 86400
Reply With Quote
  #2  
Old 04-12-2017, 20:54
kloczek kloczek is offline
Senior Member
 
Join Date: Jun 2006
Location: UK/London
Posts: 872
Default

Quote:
Originally Posted by Rodney View Post
Iīm having huge performance problem with Zabbix Server since my NVPS has achieved the value of 3.500.

However the problem starts only at midnight, sometimes 21 h, and fineshes 1 or 2 hours later. Sometimes it lasts more too.
[..]
My configuration is:
[..]
- Data Base: mysql Ver 15.1 Distrib 10.1.18-MariaDB, for Linux (x86_64) using readline 5.1. At the moment Iīm with just one node.

- Data Base host: Physical Host running CentOS 7.4.1708 (Core), 32 CPUs, 87 GB RAM
[..]
MariaDB Configuration:
[..]
innodb_buffer_pool_size = 32G
Why on the box with 87GB RAM you are using only 32GB on innodb_buffer_pool_size??? Do you have anything more running on the the DB backend host? If not you are causing that you engine id memory starving ..
You should give SQL engine almost all what is possible to innodb_buffer_pool_size.

If you have partitioned history table with 1d period just before midnight in innodb memory is not possible to cache all what is necessary to cache of growing size of the currently in us partition b-trees and by this DB engine is hitting with more read IOs you storage.

Just look on this host storage IO statistics to confirm above but almost blindly you can increase. With well cached DB content in memory on storage layer you should see way more write and read IOs (i ideal scenario it is possible to gain even 1:50 ratio between read and write IOs like you can see on my old blog entry https://kloczek.wordpress.com/2016/0...rade-surprise/).
innodb_buffer_pool_size (with 87GB you can double this memory and still you should have plenty of headroom).
Reply With Quote
  #3  
Old 05-12-2017, 14:14
Rodney Rodney is offline
Junior Member
 
Join Date: Jun 2008
Posts: 14
Default

Hi!

Indeed, the server is running only the database, so I will increase the parameters as follow:
  • innodb_buffer_pool_instances = 52
  • innodb_buffer_pool_size = 52 GB
  • innodb_change_buffer_max_size = 25 (default) (Zabbix has much more select than insert/updates)

My patitioning script is running every day at 10:30 h.

Besides, Iīve read on the Zabbix Forum that there is a bug um MySQL for versions 5.6 and 5.7 with queries ORDER BY LIMIT.

https://www.zabbix.com/forum/showthread.php?t=58582

And I think that it is exact my problem because my database version is:

+-------------------------+---------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------+
| innodb_version | 5.6.32-78.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 10.1.18-MariaDB |
| version_comment | MariaDB Server |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| version_malloc_library | system jemalloc |
| version_ssl_library | OpenSSL 1.0.1e-fips 11 Feb 2013 |
| wsrep_patch_version | wsrep_25.16 |
+-------------------------+---------------------------------+

The recomendation is to set the parameter on the my.cnf file:

optimizer_switch = 'index_condition_pushdown=off'

I will make these changes on next Saturday and I will verify for two weeks If the problem was solved.

Thank you very much.
Reply With Quote
  #4  
Old 05-12-2017, 15:06
kloczek kloczek is offline
Senior Member
 
Join Date: Jun 2006
Location: UK/London
Posts: 872
Default

Quote:
Originally Posted by Rodney View Post
My patitioning script is running every day at 10:30 h.
Moment when you are creating new partitions and dropping old ones is not relevant.
What is in such case important is moment is time when SQL engine starts using new partition. Usually partitions time windows ends at 24:00 so just before this moment is highest IO impact.
BTW: you don't need to use external partitions maintenance script. To execute maintenance SQL procedures you can use SQL event.

Quote:
Besides, Iīve read on the Zabbix Forum that there is a bug um MySQL for versions 5.6 and 5.7 with queries ORDER BY LIMIT.

https://www.zabbix.com/forum/showthread.php?t=58582
I'm using this type of the partitioning with zabbix since MySQL 4.x without any issues.
What you've quoted is about slow queries which cause has nothing to do with partition.

Quote:
The recomendation is to set the parameter on the my.cnf file:

optimizer_switch = 'index_condition_pushdown=off'

I will make these changes on next Saturday and I will verify for two weeks If the problem was solved.
index_condition_pushdown=on (which is default) is relevant when selects are using LIKE. On query history and trends tables as long as used by zabbix queries are not using LIKE.
Disable index_condition_pushdown may have some non-zero (positive) impact but in case of the zabbix DB such impact will be quite low or barely measurable.
For now I think that you can leave this on the bottom of your ToDo list.

Question: do you have on your zabbix DB backend MySQL and base OS and especially IO subsystem monitoring? I have impression that you don't have access to such monitoring data and if I'm right try to focus on this part.

PS. You just gave me idea to add to my MySQL monitoring template add monitoring optimizer_switch settings. Thx.
Reply With Quote
  #5  
Old 06-12-2017, 14:47
Rodney Rodney is offline
Junior Member
 
Join Date: Jun 2008
Posts: 14
Default

One thing that I canīt understand.

Why has Zabbix still changed old history and trends partitioning files?

-rw-rw---- 1 mysql mysql 36G Dec 5 22:29 trends_uint#P#p2017_08.ibd
-rw-rw---- 1 mysql mysql 45G Oct 1 03:14 trends_uint#P#p2017_09.ibd
-rw-rw---- 1 mysql mysql 46G Nov 17 19:45 trends_uint#P#p2017_10.ibd
-rw-rw---- 1 mysql mysql 54G Dec 6 04:16 trends_uint#P#p2017_11.ibd
-rw-rw---- 1 mysql mysql 6.0G Dec 6 10:38 trends_uint#P#p2017_12.ibd

As we can see above, why the file "trends_uint#P#p2017_11.ibd" was modified today, 06 december at 04:16 h?

Worst, why the file "trends_uint#P#p2017_08.ibd" was modified at "Dec 5 22:29"?

Does Zabbix make some insert/update on old history/trends data??
Reply With Quote
  #6  
Old 06-12-2017, 20:43
kloczek kloczek is offline
Senior Member
 
Join Date: Jun 2006
Location: UK/London
Posts: 872
Default

Did you disabled Administration->General->Housekeeping->Enable internal housekeeping?
Reply With Quote
  #7  
Old 06-12-2017, 21:15
Rodney Rodney is offline
Junior Member
 
Join Date: Jun 2008
Posts: 14
Default

Yes, as I have table patitioning for the history and trend tables.
Reply With Quote
  #8  
Old 06-12-2017, 21:18
Rodney Rodney is offline
Junior Member
 
Join Date: Jun 2008
Posts: 14
Default

Another thing that I have noted.

Almost every night I can see many queries like this:

select clock,ns,value from history where itemid=685444 and clock>1504224274 and clock<=1511385830;
select clock,ns,value from history_uint where itemid=39587285 and clock>1511107366 and clock<=1512524342;
select clock,ns,value from history_uint where itemid=31866406 and clock>1511107366 and clock<=1512524685;
select clock,ns,value from history_uint where itemid=39584648 and clock>1511107366 and clock<=1512524326;
select clock,ns,value from history_uint where itemid=41505528 and clock>1511107366 and clock<=1512523898;

What is strange is the fact that the start date in the query is a old date and always are the same:

1504224274 = GM time : 2017-09-01 00:04:34
1511107366 = GM time : 2017-11-19 16:02:46

These queries takes about 15 seconds to finish, each one, what makes the DB server stays very slow and Zabbix Server stops processing data.

Any one has a idea what is causing this behavior on the Zabbix Server?

May it be a problem with my "Value Cache Size" that is set to 1 GB today?

Why always the same old start date on the queries?

The items arenīt from the same key and are from different hosts, I mean, there is no pattern, except the start date.

Thanks.
Reply With Quote
  #9  
Old 06-12-2017, 21:49
kloczek kloczek is offline
Senior Member
 
Join Date: Jun 2006
Location: UK/London
Posts: 872
Default

Quote:
Originally Posted by Rodney View Post
Yes, as I have table patitioning for the history and trend tables.
OK, but if you are dropping older=st partitions you can as well disable internal housekeeping by go to web interface: Administration->General->Housekeeping and disable check boxes with in Trends and History sections with "Enable internal housekeeping".
Without this housekeeper still will be deleting and history and trends data using DELETE queries. Seems you did not disabled at least trend housekeeper and this is why you see old partitions with quite fresh mtime.
Reply With Quote
  #10  
Old 06-12-2017, 22:39
Rodney Rodney is offline
Junior Member
 
Join Date: Jun 2008
Posts: 14
Default

Iīve disabled history and trends housekeeping since Iīve enable table partitioning.

Thanks.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 06:28.