Ad Widget

Collapse

mysql cpu 100%+ 300000 reads/sec

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Robert Wagnon
    Member
    • Jan 2008
    • 47

    #1

    mysql cpu 100%+ 300000 reads/sec

    Our zabbix has become unstable and generates hundreds of false positive triggers. We can clearly see that this is due to mysql.

    However, we do not believe that this is due to mysql tuning issues. We are monitoring 150 hosts/3000 items/2000 triggers. Our values per second are 10. We are running on: quad core CentOS, 8GB RAM, and a 4 disk RAID10.

    We have monitored this same workload using zabbix since version 1.4.2 on mysql. When we upgraded to 1.8.5, the system stopped functioning properly. This occurs between 4 hours and 24 hours after starting zabbix_server. At this time, CPU utilization (per top) is 364%.

    We believe that zabbix_server is generating faulty queries to mysql. We use "show innodb status" to show this case. NOTICE the Row Operations:

    14.70 inserts/s, 15.13 updates/s, 0.02 deletes/s, 351062.27 reads/s

    We have followed the tuning suggestions from this forum and used both mysqltuner.pl and tuning-primer.sh.

    Any suggestions?


    Below are both the results from "show innodb status" and our my.cnf:

    --------
    FILE I/O
    --------
    I/O thread 0 state: waiting for i/o request (insert buffer thread)
    I/O thread 1 state: waiting for i/o request (log thread)
    I/O thread 2 state: waiting for i/o request (read thread)
    I/O thread 3 state: waiting for i/o request (write thread)
    Pending normal aio reads: 0, aio writes: 0,
    ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
    Pending flushes (fsync) log: 0; buffer pool: 0
    53459 OS file reads, 63770 OS file writes, 7943 OS fsyncs
    0.02 reads/s, 16384 avg bytes/read, 10.75 writes/s, 1.18 fsyncs/s
    -------------------------------------
    INSERT BUFFER AND ADAPTIVE HASH INDEX
    -------------------------------------
    Ibuf: size 1, free list len 33, seg size 35,
    2248 inserts, 2649 merged recs, 2015 merges
    Hash table size 8850487, used cells 3534061, node heap has 6652 buffer(s)
    345976.85 hash searches/s, 4889.90 non-hash searches/s
    ---
    LOG
    ---
    Log sequence number 15 1676733115
    Log flushed up to 15 1676705567
    Last checkpoint at 15 1676503298
    1 pending log writes, 0 pending chkp writes
    16294 log i/o's done, 2.62 log i/o's/second
    ----------------------
    BUFFER POOL AND MEMORY
    ----------------------
    Total memory allocated 4794653176; in additional pool allocated 14855680
    Buffer pool size 262144
    Free buffers 185427
    Database pages 70065
    Modified db pages 634
    Pending reads 0
    Pending writes: LRU 0, flush list 0, single page 0
    Pages read 69653, created 412, written 60233
    0.02 reads/s, 0.08 creates/s, 9.23 writes/s
    Buffer pool hit rate 1000 / 1000
    --------------
    ROW OPERATIONS
    --------------
    1 queries inside InnoDB, 0 queries in queue
    2 read views open inside InnoDB
    Main thread process no. 10577, id 1176803648, state: flushing log
    Number of rows inserted 54195, updated 54729, deleted 43042, read 2484948168
    14.70 inserts/s, 15.13 updates/s, 0.02 deletes/s, 351062.27 reads/s
    ----------------------------
    END OF INNODB MONITOR OUTPUT
    ============================


    Here is our my.cnf:

    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    user=mysql

    # Default to using old password format for compatibility with mysql 3.x
    # clients (those using the mysqlclient10 compatibility package).
    #old_passwords=1

    # Disabling symbolic-links is recommended to prevent assorted security risks;
    # to do so, uncomment this line:
    # symbolic-links=0

    bind-address=0.0.0.0

    skip_bdb

    tmpdir=/mytmp
    # This is a tmpfs RAM disk. See /etc/fstab

    log-slow-queries=/var/log/mysql_slow_queries.log
    long_query_time=3
    log-queries-not-using-indexes

    # The number of open tables for all threads. Increasing this value
    # increases the number of file descriptors that mysqld requires.
    # Therefore you have to make sure to set the amount of open files
    # allowed to at least 4096 in the variable "open-files-limit" in
    # section [mysqld_safe]
    #thread_concurrency = 2
    # thread_concurrency is for Solaris only.
    table_cache = 256
    thread_cache_size = 4
    query_cache_size = 256M
    query_cache_limit = 256M

    key_buffer_size = 128K
    sort_buffer_size = 64M
    join_buffer_size = 256k
    read_buffer_size = 256k
    read_rnd_buffer_size = 256k

    tmp_table_size = 256M
    max_heap_table_size = 256M

    # Recommended to prevent Zabbix "MySQL server has gone away" errors.
    wait_timeout = 230400

    # *** INNODB Specific options ***

    #skip-innodb
    innodb_additional_mem_pool_size = 16M
    innodb_buffer_pool_size = 4G
    innodb_thread_concurrency = 8
    # innodb_thread_concurrency recommended value is 2 times the number of CPUs plu
    s the number of disks.
    innodb_flush_log_at_trx_commit = 2
    innodb_max_dirty_pages_pct = 90
    innodb_flush_method=O_DIRECT

    #innodb_fast_shutdown
    #innodb_log_buffer_size = 8M
    #innodb_log_files_in_group = 3
    #innodb_file_io_threads = 4
    #innodb_force_recovery=1
    #innodb_lock_wait_timeout = 120

    #innodb_log_file_size = 256M
    # EDITING THIS TRASHES THE Zabbix database.
    # Should be 1G, but can't change without deleting ib_logfile0 and ib_logfile1.
    # Should be safe, but get a data backup first.

    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid

    ----------------------------
  • Robert Wagnon
    Member
    • Jan 2008
    • 47

    #2
    Probable cause and resolution

    We found that the problem was caused by Windows Event Logs. When we disabled these, our mysql CPU went down to 4%.

    This is not a resolution, as we feel that Windows Event Log data is important.

    It _seems_ that pre-1.8, this was handled well, and that some code change has damaged the functionality.

    I'm willing to accept the concept that we have one machine that is generating large Event Logs. HOWEVER, the information from "show innodb status;" seems excessive.

    Comments are welcome. I'm willing to investigate further to confirm or refute.

    Comment

    • eskytthe
      Senior Member
      Zabbix Certified Specialist
      • May 2011
      • 363

      #3
      Hi,
      Not sure about this is your problem, but in 1.8.6 there is made this improvment: (http://www.zabbix.com/documentation/...at_s_new_1.8.6 - section 10.2) -
      “Improved Zabbix server performance when gathering text data”
      There is a post in the forum regarding the problem (did make major performence problems for the author
      of the post) and how to fix it, but I can’t find it now. Will send link if/when I found the post.
      BR
      Erik

      Comment

      • eskytthe
        Senior Member
        Zabbix Certified Specialist
        • May 2011
        • 363

        #4
        Here it is:

        Hope it helps.
        (Did check the jira agin - not sure about the 1.8.6 and this post is connected ...)
        BR
        Erik
        Last edited by eskytthe; 28-07-2011, 22:56.

        Comment

        • Robert Wagnon
          Member
          • Jan 2008
          • 47

          #5
          Helpful information

          With any performance related problem, I recommend looking at the report for Most Busy Triggers Top 100.

          If I had thought to look at this I might have had a better idea of the problem.

          Comment

          • Robert Wagnon
            Member
            • Jan 2008
            • 47

            #6
            Suggested Patch

            The suggested patch seems quite good. I don't have a few spare hours to implement now, but certainly will if 1.8.6 doesn't bring it in.

            I hope it goes into 1.8.6. Kind of crazy if it doesn't - 100x speed improvement would help everyone processing log files.

            We're going to re-active the Event Logs for customers with clean results and "ban" anyone that won't clean up their Event Logs.

            Comment

            Working...