Ad Widget

Collapse

Updating the history to v7 using PKs and double when partitioned.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nprdev
    Junior Member
    • Jul 2023
    • 9

    #1

    Updating the history to v7 using PKs and double when partitioned.

    I've been trying to do this. So far, the steps have been, roughly.

    1. Stop zabbix.
    2. Run the initial database preparation script. This creates the new table structure and renames the existing history_X tables as history_X_old.
    3. Re-run the sql procedure that creates the initial partitioning on the new tables, making (in my case) 186 new daily partitions.
    4. Reinsert the old data into the new tables.

    I've been stuck on (4) and a way to do this effectively.
    If I try to just 'add the index' manually to the partitioned table, due to race conditions or errors, I have to fix many duplicate records. Even though this should never happen, in a large enough environment, undiscovered bugs have caused zabbix to insert the same data multiple times into the history table. Fixing these is not easy.
    a. running a query to discover them is impossible. The table is too big and it takes too long.
    b. Fixing them one by one as you encounter them might be a fool's errand. There's no way of telling how many there will be and it might take really long to wait half an hour each time for the index building query to decide to barf on a duplicate row.

    If I try to copy the database wholesale, I run into a mysql bug, and mysql will crash with a problem with low level file/memory operations. (Already filed this bug). It's too big.

    So I've resolved to try to move the history data bit by bit with smaller queries. Let's say one hour at a time, from the most recent to older. I've tried this code;

    Code:
    USE zabbix;
    DROP PROCEDURE IF EXISTS transferHistory;
    DELIMITER $$
    CREATE PROCEDURE transferHistory(
        duration int = 8035200
    )
    -- Duration: How long your longest history table is in seconds.
    -- use 31 * 86400 * [num months]. Default: 3.
    BEGIN
    
    SET @date = UNIX_TIMESTAMP();
    SET @startDate = @date - duration;
    WHILE @date > @startDate DO
        SELECT FROM_UNIXTIME(@date);
        START TRANSACTION;
            INSERT INTO history SELECT * FROM history_old WHERE history_old.clock > @date;
            DELETE FROM history_old WHERE history_old.clock > @date;
        COMMIT;
        SELECT 'history_str';
        START TRANSACTION;
            INSERT INTO history_str SELECT * FROM history_str_old WHERE history_str_old.clock > @date ;
            DELETE FROM history_str_old WHERE history_str_old.clock > @dat ;
        COMMIT ;
        SELECT 'history_text';
        START TRANSACTION ;
            INSERT INTO history_text SELECT * FROM history_text_old WHERE history_text_old.clock > @date ;
            DELETE FROM history_text_old WHERE history_text_old.clock > @date ;
        COMMIT ;
        SELECT 'history_uint';
        START TRANSACTION;
            INSERT INTO history_uint SELECT * FROM history_uint_old WHERE history_uint_old.clock > @date ;
            DELETE FROM history_uint_old WHERE history_uint_old.clock > @date ;
        COMMIT ;
        SET @date = @date - 3600;
    END WHILE;
    END$$
    DELIMITER ;
    Unfortunately for me, it's been running for a couple hours already and has failed to even do the first transaction: no sign of 'history_str' being printed in the output so far, but it did print the first unixtime, so it's stuck on the first INSERT or DELETE.

    Clearly, this method is too slow if it'll take longer than it took to write the history data to transfer it.

    Edit: It just did the first set in 1hr 34 min. So ETA so far is 4.5 months.
    The interesting part is that history took that hour and a half. The others (_str, _text, _uint), ... less than a millisecond.

    Are there SQL experts who could help writing faster code to get the poorly indexed old database' data into the new one? Or am I stuck just ditching 3 months of history (people will probably complain about that)?



    Last edited by nprdev; 10-07-2025, 13:30.
  • PavelZ
    Senior Member
    • Dec 2024
    • 162

    #2
    So disconnect all clients from MySql and do the update in one single batch script ?
    There is no point in delving into the procedured sql.

    Also, you can programmatically create batch ALTER TABLE operators text and the run:
    ALTER TABLE trends_uint PARTITION BY RANGE (clock)
    (PARTITION p2023_01 VALUES LESS THAN (UNIX_TIMESTAMP('2023-02-01 00:00:00') div 1) ENGINE = InnoDB,
    PARTITION p2023_02 VALUES LESS THAN (UNIX_TIMESTAMP('2023-03-01 00:00:00') div 1) ENGINE = InnoDB,
    ....
    If the MySQL is crashing it's easier to deal with it once. This is a very old program. There is nothing mystical happening there.

    Make sure all clients are really disconnected.
    Make sure the /tmp directory is not in use. Temporarily increase the innodb_buffer_size.

    If you are doing an operation in a cloud environment, then temporarily rent a decently powerful computer
    Last edited by PavelZ; 10-07-2025, 14:14.

    Comment

    • nprdev
      Junior Member
      • Jul 2023
      • 9

      #3
      The reason for doing it as a procedure is because procedures allow WHILE loops.

      This construct can be used to transfer the data in parts.

      That can be useful if:

      1) Your disk space would run out by doing it in one go.
      2) To minimize the risk that a crash means a re-start, as SQL is transactional, doing everything in one transaction would mean you have to start from nothing again; plus even more disk space use from the inevitable memory swapping ACID entails.

      There are better (lower level) methods available though; such as copying the data flat (using cli programs and manipulating the data directly), doing an IMPORT instead of an INSERT. However, you would still need to (re)build the index for this very big table manually (which (may) cause a mariadb crash).

      In the end, I transferred a few days of history, then cut the procedure as it was interfering with query speeds too much, then dropped all the old data. I'd still be interested in a better way of doing things (see the above), if there are clever ways of changing the data that don't entail billions of table scans into billion row tables (which is what makes it slow; 9 * 10^18 operations is a lot even for a modern computer).

      Note: I also may suspect that indexing by parts (or direct indexing) due to the way this data is organized on disk (mostly already sequential) runs into the 'worst case scenario' of some popular sorting algorithms: the data is (mostly) already sorted in the way it's supposed to be turns some variants of quicksort from O(N ln (n) ) [a few minutes] into O(n^2) [months].
      Last edited by nprdev; 14-07-2025, 17:07.

      Comment

      • PavelZ
        Senior Member
        • Dec 2024
        • 162

        #4
        Leaving aside all the scientific nonsense, so that it doesn't take months, the easiest thing to do is to do one ALTER TABLE.
        Perhaps with a service interruption.

        Comment

        • nprdev
          Junior Member
          • Jul 2023
          • 9

          #5
          ALTER TABLE was the first thing I tried.

          Unfortunately, the database itself crashed with a SIGTERM when trying that, hitting an assertion somewhere after a few hours of nothing seemingly happening. Though I suspect my disk was being heavily trashed.

          I don't think alter table can work on partitioned tables of that size. Transferring the data in chunks was the next best thing.

          Nor am I entirely sure how you would alter table add an index one part at a time... without having this kind of thing happen anyway:

          1. Add index on partition 1. Sort partition 1.
          2. Add index on partition 2. Sort partition 1 and partition 2.
          3. Add index on partition 3. Sort partition 1 and sort partition 2 and partition 3
          (...)
          180. Add index on partition 180. Sort partition 1 and partition 2 and (...) and partition 179 and partition 180.

          Depending on how naive the code is you may be looking at something that scales with N^2 or even N^3.

          If the code doesn't understand that the partitioning is guaranteed to match the index, or it just doesn't, this index sorts by itemid first, you get this inefficient shuffling.

          The latest database change isn't changing the DB data engine like in the example, it's changing the index on (clock, itemid) of type INDEX to be an index on (itemid, clock, ns, ) of type PRIMARY_KEY. The latter is much faster; but also bigger and structured differently. It appears this is a much more intensive operation.

          Edit: There's a second problem with ALTER table statements.

          Even though it's not supposed to happen, because it is theoretically possible (there was no constraint enforcing it), there are identical copies of records in my history_uint_old table. Meaning: an ALTER TABLE to add the index, even if it did not crash the database, would error out after wasting time for several hours with the message that there is some duplicate record.

          Great, delete the duplicate record, then re-start the ALTER TABLE. It will error out for another duplicate. And another, and another, and so on. So it's even worse than you think, because all the work until the error has to be done twice, the second time to roll-back the in-progress index being created, shuffling the records around many, many times on disk! The only way to avoid it is to re-create the table and move the data with INSERT IGNORE (Note: the code in the OP is before I found out about this, I just replace insert with insert ignore to get the current proc).

          For completion's sake, my current transferHistory attempt:

          Code:
          USE zabbix;
          DROP PROCEDURE IF EXISTS transferHistory;
          DELIMITER $$
          CREATE PROCEDURE transferHistory(
              duration int
          )
          -- Duration: How long your longest history table is in seconds.
          -- use 31 * 86400 * [num months]. Default: 3.
          BEGIN
          IF duration IS NULL THEN
              duration = 8035200;
          END IF;
          SET @date = UNIX_TIMESTAMP();
          SET @startDate = @date - duration;
          WHILE @date > @startDate DO
              SELECT FROM_UNIXTIME(@date);
              START TRANSACTION;
                  INSERT IGNORE INTO history SELECT * FROM history_old WHERE history_old.clock > @date;
                  DELETE FROM history_old WHERE history_old.clock > @date;
              COMMIT;
              SELECT 'history_str';
              START TRANSACTION;
                  INSERT IGNORE INTO history_str SELECT * FROM history_str_old WHERE history_str_old.clock > @date ;
                  DELETE FROM history_str_old WHERE history_str_old.clock > @date ;
              COMMIT ;
              SELECT 'history_text';
              START TRANSACTION ;
                  INSERT IGNORE INTO history_text SELECT * FROM history_text_old WHERE history_text_old.clock > @date ;
                  DELETE FROM history_text_old WHERE history_text_old.clock > @date ;
              COMMIT ;
              SELECT 'history_uint';
              START TRANSACTION;
                  INSERT IGNORE INTO history_uint SELECT * FROM history_uint_old WHERE history_uint_old.clock > @date ;
                  DELETE FROM history_uint_old WHERE history_uint_old.clock > @date ;
              COMMIT ;
              SET @date = @date - 3600;
          END WHILE;
          END$$
          DELIMITER ;
          Last edited by nprdev; 18-07-2025, 11:42.

          Comment

          Working...