Ad Widget

Collapse

Database compression for tables

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fmrapid
    Member
    • Aug 2010
    • 43

    #1

    Database compression for tables

    In order to keep database size and speed up IO i would like to propose the following bread crums:

    1 - MySQL and Postgres both support methods for compression/decompression of data tables on the fly. Implementing this change seems to be fairly simple from what I have read concerning MySQL.
    This could be applied to the trend tables and potentially a host of other tables. Some time series are boolean or have data that can really be compressed (text, blobs).

    Excerpt from InnoDB doc:
    Many workloads are I/O-bound. The idea of data compression is to pay a small cost in increased CPU utilization for the benefit of smaller databases and reduced I/O to improve throughput, potentially significantly.

    The ability to compress user data is an important new capability of the InnoDB Plugin. Compressed tables reduce the size of the database on disk, resulting in fewer reads and writes needed to access the user data. For many InnoDB workloads and many typical user tables (especially with read-intensive applications where sufficient memory is available to keep frequently-used data in memory), compression not only significantly reduces the storage required for the database, but also improves throughput by reducing the I/O workload, at a modest cost in processing overhead. The storage cost savings can be important, but the reduction in I/O costs can be even more valuable.

    ...

    To create a compressed table, you might use a statement like this:

    CREATE TABLE name
    (column1 INT PRIMARY KEY)
    ENGINE=InnoDB
    ROW_FORMAT=COMPRESSED
    KEY_BLOCK_SIZE=4;

    fmrapid
    +1 SNMP
    +1 Scalability
  • fjrial
    Senior Member
    • Feb 2010
    • 140

    #2
    It looks like more than interesting..

    I would like to know the opinion of technical people about this.

    As you mention:
    For many InnoDB workloads and many typical user tables (especially with read-intensive applications where sufficient memory is available to keep frequently-used data in memory),....
    And Zabbix is more write-intensive than reading, at least for me..

    Maybe data compression deserves a try.
    Thanks for the info.

    Comment

    • fmrapid
      Member
      • Aug 2010
      • 43

      #3
      Meeting IT historian scalability - with a relational database

      The most efficient storage of time series data: RRD databases. Fast, fixed size, multiple levels of resolution for short to very long term trending.
      Zabbix: Two levels of trending. Coarse, but a good compromise.
      OSI Soft PI: Historian that stores the data using the revolving door algorithm into a non relational data store and includes compression.

      Having successfully used RRDs to store massive amounts of data with lightning fast retrieval and display, it is the reference to beat in terms of operational flexibility for both support and engineering type work.

      Going beyond some of its limits for storing other data (events, boolean data, bulk updates, variable time-series, no data averaging in high resolution) makes Zabbix very very interesting. It does data point consolidation, it has a circular buffer like an RRD, but it does not address the typical bloat that accompanies relational databases...

      There is now a possibility to revolutionize this product space. No other IT monitoring system uses inherent database compression. (OpenNMS, Nagios, Groundwork, Zenoss, Hyperic, PRTG, Cricket, Hobbit, Cacti, HPOV, Tivoli, etc.).

      The only ones that do are process historians with custom databases.

      The are still tons of areas to improve in Zabbix, but this is probably one of those key differentiators that are now available out of the box by the underlying databases. This could also mean that the historical trend table could use a shorter consolidation period than what is used today.

      if anyone has used MySQL database compression or has insight into what this could mean or not please speak up. If warranted will add it to the Tracker.

      If anyone is installing or testing an installation, perhaps you could convert some tables to use compression and compare the # of data updates per second before and after the change.

      Cheers

      fmrapid

      Comment

      • fmrapid
        Member
        • Aug 2010
        • 43

        #4
        MySQL InnoDB database compression

        So is anyone willing to give it a try and compare the performance and size results between two databases.

        I expect it is possible to export a database, recreate it with compression enabled and import the data.

        The MySQL documentation has some good guidelines:

        InnoDB KEY_BLOCK_SIZE of 8K, selecting tables that do not contain random float, integers or binary data (gifs, jpegs).

        Exporting the raw data and compressing it with gzip or winzip and comparing it to the uncompressed data.

        Having more memory (RAM) works best with compressed tables.
        Having a dedicated DB server when using compression is also common sense, as the dedicated DB server has spare CPU cycles but is IO bound.


        fmrapid

        Excerpt from InnoDB 1.1 - MySQL 5.5 documentation:

        The InnoDB storage engine implements a novel type of compression with the help of the well-known zlib library, which implements the LZ77 compression algorithm. This compression algorithm is mature, robust, and efficient in both CPU utilization and in reduction of data size. The algorithm is “lossless”, so that the original uncompressed data can always be reconstructed from the compressed form. LZ77 compression works by finding sequences of data that are repeated within the data to be compressed. The patterns of values in your data determine how well it compresses, but typical user data often compresses by 50% or more.

        Unlike compression performed by an application, or compression features of some other database management systems, InnoDB compression applies both to user data and to indexes. In many cases, indexes can constitute 40-50% or more of the total database size, so this difference is significant. When compression is working well for a data set, the size of the InnoDB data files (the .idb files) is 25% to 50% of the uncompressed size or possibly smaller. Depending on the workload, this smaller database can in turn lead to a reduction in I/O, and an increase in throughput, at a modest cost in terms of increased CPU utilization.

        Comment

        • 0siris
          Member
          Zabbix Certified Specialist
          • Nov 2010
          • 76

          #5
          I wonder if this has effect on the duration of a backup / restore as well. If so, that would be great. We're using a database of about 8GB, which takes hours to restore, during which monitoring is not possible.
          Although we have MySQL master/master replication setup, it's always good to know that a restore is a feasible option IMHO

          Comment

          • Jason
            Senior Member
            • Nov 2007
            • 430

            #6
            Has anyone actually implemented compression and if so on what tables?

            I can imagine it would work on the trends_* tables, but what about the history_* tables and particularly the history_uint which, on our setup, is the biggest by far.

            Comment

            • ghoz
              Senior Member
              • May 2011
              • 204

              #7
              I made somes tests, with some gain in database size.

              History tables are a big no-no performance wise as they are constantly updated and integer /floats don't compress that much.
              I don't have much text/log/char items so I can't comment on theses history tables as they should compress quite well.

              Right now I'm compressing trends* tables with default block size.
              I see spikes every hour in compression /decompression time and a huge one at backup time as expected.

              disk iops did not change significantly : write ops down by ~10%, reads about the same

              YMMV.
              Attached Files

              Comment

              • Colttt
                Senior Member
                Zabbix Certified Specialist
                • Mar 2009
                • 878

                #8
                and works it faster?
                Debian-User

                Sorry for my bad english

                Comment

                Working...