ZABBIX Forums

ZABBIX Forums (http://www.zabbix.com/forum/index.php)
-   Zabbix for Large Environments (http://www.zabbix.com/forum/forumdisplay.php?f=23)
-   -   Zabbix Performance Problem (http://www.zabbix.com/forum/showthread.php?t=60038)

Rodney 04-12-2017 18:52

Zabbix Performance Problem
 
Hi,

Im having huge performance problem with Zabbix Server since my NVPS has achieved the value of 3.500.

However the problem starts only at midnight, sometimes 21 h, and fineshes 1 or 2 hours later. Sometimes it lasts more too.

During this time, the items arent updated and some data is lost. The DBsyncers process stays very slow and
the CPU I/O wait increase very much in the DB Server. The Zabbix Dashboard doent work too.

At the DB server we can see a lot of queries like this one:

select clock,ns,value from history_uint where itemid=<id> and clock><end> and clock<=<begin>

During the day, the CPU utilization of the DB server is very low and everything works fine.

Has anyone any idea what is happening at this hour and what can be done do solve this problem?

Do you think that the problem is related with the DB Server or the Zabbix Server?

I think that it is related with the dbsyncer processes, but Im not sure.

As my environment is very big, the debug level in the Zabbix Server is set to 2 (Error) and I couldnt see anything.

Nothing on the /var/log/messagestoo, on the DB Server.

Thanks,
Rodney



My configuration is:

- Zabbix Server version: 3.2.10

- Zabbix Server host: Virtual Machine running CentOS 7.3.1611 64 bits, 4 vCPs, 16 GB RAM

- Data Base: mysql Ver 15.1 Distrib 10.1.18-MariaDB, for Linux (x86_64) using readline 5.1. At the moment Im with just one node.

- Data Base host: Physical Host running CentOS 7.4.1708 (Core), 32 CPUs, 87 GB RAM

- Several proxies version 3.2.10, on VMs CentOS 7.4.

- Partition tables enable on history and trend tables.

Zabbix Serve Configuration (main parameters changed):

StartTrappers=100
MaxHousekeeperDelete=100000
CacheSize=1536M
CacheUpdateFrequency=300
StartDBSyncers=8 (should I increase it?)
HistoryCacheSize=512M (during the problem, it goes to zero)
TrendCacheSize=256M
ValueCacheSize=1G
Timeout=4
LogSlowQueries=10000

MariaDB Configuration:

open_files_limit = 16364
max_connections = 500
binlog_format=ROW
default_storage_engine=InnoDB

innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_additional_mem_pool_size = 16M
innodb_autoextend_increment = 256
innodb_buffer_pool_instances = 32
innodb_buffer_pool_size = 32G
innodb_change_buffer_max_size = 50
innodb_concurrency_tickets = 5000
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
innodb_log_files_in_group = 4
innodb_old_blocks_time = 1000
innodb_open_files = 2048
innodb_stats_on_metadata = OFF
innodb_lock_wait_timeout = 50
innodb_io_capacity = 2000

large-pages
binlog-row-event-max-size = 8192
character_set_server = utf8
collation_server = utf8_bin
expire_logs_days = 1
join_buffer_size = 262144
max_allowed_packet = 32M
max_connect_errors = 10000
max_heap_table_size = 134217728
query_cache_type = 0
query_cache_size = 0
slow-query-log = ON
table_open_cache = 2048
thread_cache_size = 64
tmp_table_size = 134217728
wait_timeout = 86400

kloczek 04-12-2017 20:54

Quote:

Originally Posted by Rodney (Post 207049)
Im having huge performance problem with Zabbix Server since my NVPS has achieved the value of 3.500.

However the problem starts only at midnight, sometimes 21 h, and fineshes 1 or 2 hours later. Sometimes it lasts more too.
[..]
My configuration is:
[..]
- Data Base: mysql Ver 15.1 Distrib 10.1.18-MariaDB, for Linux (x86_64) using readline 5.1. At the moment Im with just one node.

- Data Base host: Physical Host running CentOS 7.4.1708 (Core), 32 CPUs, 87 GB RAM
[..]
MariaDB Configuration:
[..]
innodb_buffer_pool_size = 32G

Why on the box with 87GB RAM you are using only 32GB on innodb_buffer_pool_size??? Do you have anything more running on the the DB backend host? If not you are causing that you engine id memory starving ..
You should give SQL engine almost all what is possible to innodb_buffer_pool_size.

If you have partitioned history table with 1d period just before midnight in innodb memory is not possible to cache all what is necessary to cache of growing size of the currently in us partition b-trees and by this DB engine is hitting with more read IOs you storage.

Just look on this host storage IO statistics to confirm above but almost blindly you can increase. With well cached DB content in memory on storage layer you should see way more write and read IOs (i ideal scenario it is possible to gain even 1:50 ratio between read and write IOs like you can see on my old blog entry https://kloczek.wordpress.com/2016/0...rade-surprise/).
innodb_buffer_pool_size (with 87GB you can double this memory and still you should have plenty of headroom).

Rodney 05-12-2017 14:14

Hi!

Indeed, the server is running only the database, so I will increase the parameters as follow:
  • innodb_buffer_pool_instances = 52
  • innodb_buffer_pool_size = 52 GB
  • innodb_change_buffer_max_size = 25 (default) (Zabbix has much more select than insert/updates)

My patitioning script is running every day at 10:30 h.

Besides, Ive read on the Zabbix Forum that there is a bug um MySQL for versions 5.6 and 5.7 with queries ORDER BY LIMIT.

https://www.zabbix.com/forum/showthread.php?t=58582

And I think that it is exact my problem because my database version is:

+-------------------------+---------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------+
| innodb_version | 5.6.32-78.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 10.1.18-MariaDB |
| version_comment | MariaDB Server |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| version_malloc_library | system jemalloc |
| version_ssl_library | OpenSSL 1.0.1e-fips 11 Feb 2013 |
| wsrep_patch_version | wsrep_25.16 |
+-------------------------+---------------------------------+

The recomendation is to set the parameter on the my.cnf file:

optimizer_switch = 'index_condition_pushdown=off'

I will make these changes on next Saturday and I will verify for two weeks If the problem was solved.

Thank you very much.

kloczek 05-12-2017 15:06

Quote:

Originally Posted by Rodney (Post 207089)
My patitioning script is running every day at 10:30 h.

Moment when you are creating new partitions and dropping old ones is not relevant.
What is in such case important is moment is time when SQL engine starts using new partition. Usually partitions time windows ends at 24:00 so just before this moment is highest IO impact.
BTW: you don't need to use external partitions maintenance script. To execute maintenance SQL procedures you can use SQL event.

Quote:

Besides, Ive read on the Zabbix Forum that there is a bug um MySQL for versions 5.6 and 5.7 with queries ORDER BY LIMIT.

https://www.zabbix.com/forum/showthread.php?t=58582
I'm using this type of the partitioning with zabbix since MySQL 4.x without any issues.
What you've quoted is about slow queries which cause has nothing to do with partition.

Quote:

The recomendation is to set the parameter on the my.cnf file:

optimizer_switch = 'index_condition_pushdown=off'

I will make these changes on next Saturday and I will verify for two weeks If the problem was solved.
index_condition_pushdown=on (which is default) is relevant when selects are using LIKE. On query history and trends tables as long as used by zabbix queries are not using LIKE.
Disable index_condition_pushdown may have some non-zero (positive) impact but in case of the zabbix DB such impact will be quite low or barely measurable.
For now I think that you can leave this on the bottom of your ToDo list.

Question: do you have on your zabbix DB backend MySQL and base OS and especially IO subsystem monitoring? I have impression that you don't have access to such monitoring data and if I'm right try to focus on this part.

PS. You just gave me idea to add to my MySQL monitoring template add monitoring optimizer_switch settings. Thx.

Rodney 06-12-2017 14:47

One thing that I cant understand.

Why has Zabbix still changed old history and trends partitioning files?

-rw-rw---- 1 mysql mysql 36G Dec 5 22:29 trends_uint#P#p2017_08.ibd
-rw-rw---- 1 mysql mysql 45G Oct 1 03:14 trends_uint#P#p2017_09.ibd
-rw-rw---- 1 mysql mysql 46G Nov 17 19:45 trends_uint#P#p2017_10.ibd
-rw-rw---- 1 mysql mysql 54G Dec 6 04:16 trends_uint#P#p2017_11.ibd
-rw-rw---- 1 mysql mysql 6.0G Dec 6 10:38 trends_uint#P#p2017_12.ibd

As we can see above, why the file "trends_uint#P#p2017_11.ibd" was modified today, 06 december at 04:16 h?

Worst, why the file "trends_uint#P#p2017_08.ibd" was modified at "Dec 5 22:29"?

Does Zabbix make some insert/update on old history/trends data??

kloczek 06-12-2017 20:43

Did you disabled Administration->General->Housekeeping->Enable internal housekeeping?

Rodney 06-12-2017 21:15

Yes, as I have table patitioning for the history and trend tables.

Rodney 06-12-2017 21:18

Another thing that I have noted.

Almost every night I can see many queries like this:

select clock,ns,value from history where itemid=685444 and clock>1504224274 and clock<=1511385830;
select clock,ns,value from history_uint where itemid=39587285 and clock>1511107366 and clock<=1512524342;
select clock,ns,value from history_uint where itemid=31866406 and clock>1511107366 and clock<=1512524685;
select clock,ns,value from history_uint where itemid=39584648 and clock>1511107366 and clock<=1512524326;
select clock,ns,value from history_uint where itemid=41505528 and clock>1511107366 and clock<=1512523898;

What is strange is the fact that the start date in the query is a old date and always are the same:

1504224274 = GM time : 2017-09-01 00:04:34
1511107366 = GM time : 2017-11-19 16:02:46

These queries takes about 15 seconds to finish, each one, what makes the DB server stays very slow and Zabbix Server stops processing data.

Any one has a idea what is causing this behavior on the Zabbix Server?

May it be a problem with my "Value Cache Size" that is set to 1 GB today?

Why always the same old start date on the queries?

The items arent from the same key and are from different hosts, I mean, there is no pattern, except the start date.

Thanks.

kloczek 06-12-2017 21:49

Quote:

Originally Posted by Rodney (Post 207162)
Yes, as I have table patitioning for the history and trend tables.

OK, but if you are dropping older=st partitions you can as well disable internal housekeeping by go to web interface: Administration->General->Housekeeping and disable check boxes with in Trends and History sections with "Enable internal housekeeping".
Without this housekeeper still will be deleting and history and trends data using DELETE queries. Seems you did not disabled at least trend housekeeper and this is why you see old partitions with quite fresh mtime.

Rodney 06-12-2017 22:39

Ive disabled history and trends housekeeping since Ive enable table partitioning.

Thanks.


All times are GMT +2. The time now is 10:25.