Ad Widget

Collapse

Disk IO usage with 3.4 over 80%

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mellis
    Senior Member
    • Oct 2017
    • 145

    #1

    Disk IO usage with 3.4 over 80%

    I have been using Zabbix for several years now and mostly use it to monitor what I thought would be small to medium site systems,,, 300 to 600 devices with 40,000 items and 6500 triggers. Mostly not much performance issues. We want to upgrade all these to 3.4 and then move to a master server and proxy at each site.

    I upgrade the first site from a 3.2 to a 3.4. then I found that the dashboard would have 10 to 15 false positives for the device being unreachable. I tried the normal steps looking at zabbix performance data and making adjustments in the zabbix_server.conf and my.cnf files. Not much help. move the database to a second server. trimmed the history tables.

    at the end i did a atop and found the 3.4 was using 80 to 90% disk IO, i restored the 3.2 and check the atop and found the disk io was back down to 20 to 30%

    So my question is does the 3.4 really use that much more disk? What can i expect in the 4.0?

    It is going to be very hard to sale a full scale roll out of a nation wide zabbix if we have this issue with the 3.4 / 4.0.

    Any insight would be help full.

  • kloczek
    Senior Member
    • Jun 2006
    • 1771

    #2
    Originally posted by mellis
    It is going to be very hard to sale a full scale roll out of a nation wide zabbix if we have this issue with the 3.4 / 4.0.
    No, it uses DB backend eve less than 3.2.
    Do you have any basic DB backend monitoring like selec, indersts, begin, end, delete queries stats? IO statis?
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment

    • vso
      Zabbix developer
      • Aug 2016
      • 190

      #3
      Please provide more information is it busy housekeeper ?

      Comment

      • mellis
        Senior Member
        • Oct 2017
        • 145

        #4
        Sorry for the slow responce, I had already reverted back to 3.2 so i grab some information under the 3.2 and then did the upgrade again and grabed it agian. I also have some screen shots of the housekeeping. all the raw information is in the word doc attached. i will try to give you a good summary here.

        1) the hardware, these are VM's 4xCPU, 24GB, ~220GB Disk. One server has the database the other has the Web GUI and the Zabbix Server.
        2) the OS CentOS 7.3 fully updated.
        3) the Database MySQL 5.7.22 little better than 30GB

        4) my.cnf
        show_compatibility_56 = ON
        performance_schema
        innodb_buffer_pool_size = 2G
        innodb_data_home_dir=/home/mysql
        innodb_file_per_table = 1
        innodb_buffer_pool_instances = 14
        innodb_buffer_pool_size = 18G
        innodb_page_cleaners=14
        tmp_table_size = 36M
        max_heap_table_size = 36M
        join_buffer_size = 1G
        sort_buffer_size = 4M
        read_rnd_buffer_size = 4M
        query_cache_size = 0
        query_cache_type = 0
        query_cache_limit = 2M
        max_connections = 376
        wait_timeout = 14400
        interactive_timeout = 14400
        datadir=/home/mysql
        socket=/var/lib/mysql/mysql.sock
        # Disabling symbolic-links is recommended to prevent assorted security risks
        symbolic-links=0
        #logging stuff
        log-error=/var/log/mysqld.log
        slow_query_log = 1
        slow_query_log_file = /var/log/slow_quiery.log
        pid-file=/var/run/mysqld/mysqld.

        WhneI am running the 3.2 the disk IO is in the 20 to 30% range, after the 3.4 upgrade it is over 80%.

        The housekeeping looks the same, it gets to 100% for 30mins every hour. Graph on doc.

        In the mysql.log file I see lots of [Note] Aborted connection 30399 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets) and [Note] InnoDB: page_cleaner: 1000ms intended loop took 6170ms. The settings might not be optimal. (flushed=437 and evicted=0, during the time.)

        I have chased these error for months but can never seem to get the to clear out.

        I have also use the mysqltuner.pl script and have seem some improvment, but can ever seem to complete solve the problem.

        Any help would be great,,

        Attached Files

        Comment

        • vso
          Zabbix developer
          • Aug 2016
          • 190

          #5
          which exact version of 3.2 and 3.4 do you use ? It looks like you might have too big events table.

          Comment

          • mellis
            Senior Member
            • Oct 2017
            • 145

            #6
            It was 3.2.11 to 3.4.8. I did the data purge described at http://www.michaelfoster82.co.uk/zab...lete-old-data/ trimming the tables back to 7 dayshistory and 90 days trends, that helped for about 18 hours.

            Comment

            • mellis
              Senior Member
              • Oct 2017
              • 145

              #7
              I am seeing disk IO on the database server now over 100% and the housekeeping goes to 100% for 30mins once every hour. I would have not though that there was much housekeeping to do since i did a data purge and remove any history over 7 days old. I am using atop to get the disk usage.
              In the mysql.log i am seeing alot of messages the the "InnoDB page_cleaner: 1000ms intended loop took 5000ms. ( this number bounces around 4000 to 6000)

              I have increased the innodb_buffer_pool_instances & innodb_page_cleaners to 14 each.

              Again this is a 4 vCpu box with 24GB ram, it ios using 21GB at this time.

              Comment

              • vso
                Zabbix developer
                • Aug 2016
                • 190

                #8
                Can you please attach the log, it will show what housekeeper has cleaned.

                Comment

                • mellis
                  Senior Member
                  • Oct 2017
                  • 145

                  #9
                  Would only let me upload a small one, what should i look for and i will scan all of them and send the parts needed
                  Attached Files

                  Comment

                  • vso
                    Zabbix developer
                    • Aug 2016
                    • 190

                    #10
                    Something like:
                    3314:20180516:155439.059 housekeeper [deleted 0 hist/trends, 3 items/triggers, 0 events, 0 problems, 0 sessions, 0 alarms, 0 audit items in 0.456314 sec, idle for 1 hour(s)]

                    Comment

                    • mellis
                      Senior Member
                      • Oct 2017
                      • 145

                      #11
                      did not see it, set the debug up to 5 and size to 8, i will get back as soon as I find something

                      Comment

                      • mellis
                        Senior Member
                        • Oct 2017
                        • 145

                        #12
                        Attached please find addition information, i never have found a line in the log like the one above. what I did find is detailed in the doc.
                        Attached Files

                        Comment

                        • mellis
                          Senior Member
                          • Oct 2017
                          • 145

                          #13
                          Should I look at doing some partitions? 100% of the host are now reporting thatthey are unreachable, but I know they are other wise I would be having bigger problems than this.

                          Comment

                          Working...