Announcement

Collapse
No announcement yet.

Zabbix Performance Problem

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    Zabbix Performance Problem

    Hi,

    I´m having huge performance problem with Zabbix Server since my NVPS has achieved the value of 3.500.

    However the problem starts only at midnight, sometimes 21 h, and fineshes 1 or 2 hours later. Sometimes it lasts more too.

    During this time, the items aren´t updated and some data is lost. The DBsyncers process stays very slow and
    the CPU I/O wait increase very much in the DB Server. The Zabbix Dashboard doen´t work too.

    At the DB server we can see a lot of queries like this one:

    select clock,ns,value from history_uint where itemid=<id> and clock><end> and clock<=<begin>

    During the day, the CPU utilization of the DB server is very low and everything works fine.

    Has anyone any idea what is happening at this hour and what can be done do solve this problem?

    Do you think that the problem is related with the DB Server or the Zabbix Server?

    I think that it is related with the dbsyncer processes, but I´m not sure.

    As my environment is very big, the debug level in the Zabbix Server is set to 2 (Error) and I couldn´t see anything.

    Nothing on the /var/log/messagestoo, on the DB Server.

    Thanks,
    Rodney



    My configuration is:

    - Zabbix Server version: 3.2.10

    - Zabbix Server host: Virtual Machine running CentOS 7.3.1611 64 bits, 4 vCPs, 16 GB RAM

    - Data Base: mysql Ver 15.1 Distrib 10.1.18-MariaDB, for Linux (x86_64) using readline 5.1. At the moment I´m with just one node.

    - Data Base host: Physical Host running CentOS 7.4.1708 (Core), 32 CPUs, 87 GB RAM

    - Several proxies version 3.2.10, on VMs CentOS 7.4.

    - Partition tables enable on history and trend tables.

    Zabbix Serve Configuration (main parameters changed):

    StartTrappers=100
    MaxHousekeeperDelete=100000
    CacheSize=1536M
    CacheUpdateFrequency=300
    StartDBSyncers=8 (should I increase it?)
    HistoryCacheSize=512M (during the problem, it goes to zero)
    TrendCacheSize=256M
    ValueCacheSize=1G
    Timeout=4
    LogSlowQueries=10000

    MariaDB Configuration:

    open_files_limit = 16364
    max_connections = 500
    binlog_format=ROW
    default_storage_engine=InnoDB

    innodb_autoinc_lock_mode=2
    innodb_flush_log_at_trx_commit=0
    innodb_additional_mem_pool_size = 16M
    innodb_autoextend_increment = 256
    innodb_buffer_pool_instances = 32
    innodb_buffer_pool_size = 32G
    innodb_change_buffer_max_size = 50
    innodb_concurrency_tickets = 5000
    innodb_file_per_table = 1
    innodb_flush_method = O_DIRECT
    innodb_log_file_size = 512M
    innodb_log_files_in_group = 4
    innodb_old_blocks_time = 1000
    innodb_open_files = 2048
    innodb_stats_on_metadata = OFF
    innodb_lock_wait_timeout = 50
    innodb_io_capacity = 2000

    large-pages
    binlog-row-event-max-size = 8192
    character_set_server = utf8
    collation_server = utf8_bin
    expire_logs_days = 1
    join_buffer_size = 262144
    max_allowed_packet = 32M
    max_connect_errors = 10000
    max_heap_table_size = 134217728
    query_cache_type = 0
    query_cache_size = 0
    slow-query-log = ON
    table_open_cache = 2048
    thread_cache_size = 64
    tmp_table_size = 134217728
    wait_timeout = 86400

    #2
    Originally posted by Rodney View Post
    I´m having huge performance problem with Zabbix Server since my NVPS has achieved the value of 3.500.

    However the problem starts only at midnight, sometimes 21 h, and fineshes 1 or 2 hours later. Sometimes it lasts more too.
    [..]
    My configuration is:
    [..]
    - Data Base: mysql Ver 15.1 Distrib 10.1.18-MariaDB, for Linux (x86_64) using readline 5.1. At the moment I´m with just one node.

    - Data Base host: Physical Host running CentOS 7.4.1708 (Core), 32 CPUs, 87 GB RAM
    [..]
    MariaDB Configuration:
    [..]
    innodb_buffer_pool_size = 32G
    Why on the box with 87GB RAM you are using only 32GB on innodb_buffer_pool_size??? Do you have anything more running on the the DB backend host? If not you are causing that you engine id memory starving ..
    You should give SQL engine almost all what is possible to innodb_buffer_pool_size.

    If you have partitioned history table with 1d period just before midnight in innodb memory is not possible to cache all what is necessary to cache of growing size of the currently in us partition b-trees and by this DB engine is hitting with more read IOs you storage.

    Just look on this host storage IO statistics to confirm above but almost blindly you can increase. With well cached DB content in memory on storage layer you should see way more write and read IOs (i ideal scenario it is possible to gain even 1:50 ratio between read and write IOs like you can see on my old blog entry https://kloczek.wordpress.com/2016/0...rade-surprise/).
    innodb_buffer_pool_size (with 87GB you can double this memory and still you should have plenty of headroom).
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment


      #3
      Hi!

      Indeed, the server is running only the database, so I will increase the parameters as follow:
      • innodb_buffer_pool_instances = 52
      • innodb_buffer_pool_size = 52 GB
      • innodb_change_buffer_max_size = 25 (default) (Zabbix has much more select than insert/updates)


      My patitioning script is running every day at 10:30 h.

      Besides, I´ve read on the Zabbix Forum that there is a bug um MySQL for versions 5.6 and 5.7 with queries ORDER BY LIMIT.

      https://www.zabbix.com/forum/showthread.php?t=58582

      And I think that it is exact my problem because my database version is:

      +-------------------------+---------------------------------+
      | Variable_name | Value |
      +-------------------------+---------------------------------+
      | innodb_version | 5.6.32-78.1 |
      | protocol_version | 10 |
      | slave_type_conversions | |
      | version | 10.1.18-MariaDB |
      | version_comment | MariaDB Server |
      | version_compile_machine | x86_64 |
      | version_compile_os | Linux |
      | version_malloc_library | system jemalloc |
      | version_ssl_library | OpenSSL 1.0.1e-fips 11 Feb 2013 |
      | wsrep_patch_version | wsrep_25.16 |
      +-------------------------+---------------------------------+

      The recomendation is to set the parameter on the my.cnf file:

      optimizer_switch = 'index_condition_pushdown=off'

      I will make these changes on next Saturday and I will verify for two weeks If the problem was solved.

      Thank you very much.

      Comment


        #4
        Originally posted by Rodney View Post
        My patitioning script is running every day at 10:30 h.
        Moment when you are creating new partitions and dropping old ones is not relevant.
        What is in such case important is moment is time when SQL engine starts using new partition. Usually partitions time windows ends at 24:00 so just before this moment is highest IO impact.
        BTW: you don't need to use external partitions maintenance script. To execute maintenance SQL procedures you can use SQL event.

        Besides, I´ve read on the Zabbix Forum that there is a bug um MySQL for versions 5.6 and 5.7 with queries ORDER BY LIMIT.

        https://www.zabbix.com/forum/showthread.php?t=58582
        I'm using this type of the partitioning with zabbix since MySQL 4.x without any issues.
        What you've quoted is about slow queries which cause has nothing to do with partition.

        The recomendation is to set the parameter on the my.cnf file:

        optimizer_switch = 'index_condition_pushdown=off'

        I will make these changes on next Saturday and I will verify for two weeks If the problem was solved.
        index_condition_pushdown=on (which is default) is relevant when selects are using LIKE. On query history and trends tables as long as used by zabbix queries are not using LIKE.
        Disable index_condition_pushdown may have some non-zero (positive) impact but in case of the zabbix DB such impact will be quite low or barely measurable.
        For now I think that you can leave this on the bottom of your ToDo list.

        Question: do you have on your zabbix DB backend MySQL and base OS and especially IO subsystem monitoring? I have impression that you don't have access to such monitoring data and if I'm right try to focus on this part.

        PS. You just gave me idea to add to my MySQL monitoring template add monitoring optimizer_switch settings. Thx.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment


          #5
          One thing that I can´t understand.

          Why has Zabbix still changed old history and trends partitioning files?

          -rw-rw---- 1 mysql mysql 36G Dec 5 22:29 trends_uint#P#p2017_08.ibd
          -rw-rw---- 1 mysql mysql 45G Oct 1 03:14 trends_uint#P#p2017_09.ibd
          -rw-rw---- 1 mysql mysql 46G Nov 17 19:45 trends_uint#P#p2017_10.ibd
          -rw-rw---- 1 mysql mysql 54G Dec 6 04:16 trends_uint#P#p2017_11.ibd
          -rw-rw---- 1 mysql mysql 6.0G Dec 6 10:38 trends_uint#P#p2017_12.ibd

          As we can see above, why the file "trends_uint#P#p2017_11.ibd" was modified today, 06 december at 04:16 h?

          Worst, why the file "trends_uint#P#p2017_08.ibd" was modified at "Dec 5 22:29"?

          Does Zabbix make some insert/update on old history/trends data??

          Comment


            #6
            Did you disabled Administration->General->Housekeeping->Enable internal housekeeping?
            http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
            https://kloczek.wordpress.com/
            zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
            My zabbix templates https://github.com/kloczek/zabbix-templates

            Comment


              #7
              Yes, as I have table patitioning for the history and trend tables.

              Comment


                #8
                Another thing that I have noted.

                Almost every night I can see many queries like this:

                select clock,ns,value from history where itemid=685444 and clock>1504224274 and clock<=1511385830;
                select clock,ns,value from history_uint where itemid=39587285 and clock>1511107366 and clock<=1512524342;
                select clock,ns,value from history_uint where itemid=31866406 and clock>1511107366 and clock<=1512524685;
                select clock,ns,value from history_uint where itemid=39584648 and clock>1511107366 and clock<=1512524326;
                select clock,ns,value from history_uint where itemid=41505528 and clock>1511107366 and clock<=1512523898;

                What is strange is the fact that the start date in the query is a old date and always are the same:

                1504224274 = GM time : 2017-09-01 00:04:34
                1511107366 = GM time : 2017-11-19 16:02:46

                These queries takes about 15 seconds to finish, each one, what makes the DB server stays very slow and Zabbix Server stops processing data.

                Any one has a idea what is causing this behavior on the Zabbix Server?

                May it be a problem with my "Value Cache Size" that is set to 1 GB today?

                Why always the same old start date on the queries?

                The items aren´t from the same key and are from different hosts, I mean, there is no pattern, except the start date.

                Thanks.

                Comment


                  #9
                  Originally posted by Rodney View Post
                  Yes, as I have table patitioning for the history and trend tables.
                  OK, but if you are dropping older=st partitions you can as well disable internal housekeeping by go to web interface: Administration->General->Housekeeping and disable check boxes with in Trends and History sections with "Enable internal housekeeping".
                  Without this housekeeper still will be deleting and history and trends data using DELETE queries. Seems you did not disabled at least trend housekeeper and this is why you see old partitions with quite fresh mtime.
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment


                    #10
                    I´ve disabled history and trends housekeeping since I´ve enable table partitioning.

                    Thanks.

                    Comment


                      #11
                      So now please make sure that no one is able to fiddle with database content using for example queries executed over mysql comadline client or any other client.
                      Than wait a bit to confirm that you are still able to see old partitions trends tables changes.
                      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                      https://kloczek.wordpress.com/
                      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                      My zabbix templates https://github.com/kloczek/zabbix-templates

                      Comment


                        #12
                        It seems the problem is related to the proxies.

                        I´m using Zabbix Proxies with pacemaker and master-slave configuration. For some reason, the second node is starting at night and sending old data to the Zabbix Server.

                        I´ve stopped all the pacemaker cluster and I will check if the problem was solved.

                        Besides, I must find a way to clean all the old data on the second node before start Zabbix Proxy on it.

                        Thank you very much!

                        Comment

                        Working...
                        X