Ad Widget

Collapse

Changes to History Retention Settings Not Taking Effect

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • alexw-z
    Member
    • Dec 2021
    • 36

    #1

    Changes to History Retention Settings Not Taking Effect


    I've got a Zabbix 4.2 install, and I need to shave down the History tables in preparation for a server migration and upgrade.

    The original settings were 3w days History and 90d Trends, I've gradually been dropping these settings and manually re-running the housekeeper, but the number of records processed by the housekeeper seems to stay static. These tables look to be sparse files, but if I check their real size with either du. or use mysqltuner to check how much space an Optimise table would free up, there have been no significant changes.The operation takes maybe <10 minutes to run each time, and doesn't seem to be hammering the box resource wise.

    zabbix 20578 0.5 0.1 971848 125996 ? 0:05 /usr/sbin/zabbix_server: housekeeper [deleted 11041507 hist/trends, 0 items/triggers, 76268 events, 1 sessions, 0 alarms, 8 audit items, 0 records in 385.297637 sec, idle for 1 hour(s)]

    In addition, if I check the oldest records present in the history tables, there are still records present from 3 weeks ago.

    MariaDB [zabbix]> SELECT FROM_UNIXTIME(clock) FROM history_uint LIMIT 3;
    +----------------------+
    | FROM_UNIXTIME(clock) |
    +----------------------+
    | 2022-12-17 16:18:08 |
    | 2022-12-17 16:19:08 |
    | 2022-12-17 16:20:08 |
    +----------------------+
    3 rows in set (0.00 sec)


    In addition, I've tried upping the MaxHousekeeperDelete setting initially to 50000 and then to 9(Unlimited), but this equally has no effect on the amount of records cleared per run, which has remained surprisingly

    zabbix 20578 0.5 0.1 971848 125996 ? 0:05 /usr/sbin/zabbix_server: housekeeper [deleted 11041507 hist/trends, 0 items/triggers, 76268 events, 1 sessions, 0 alarms, 8 audit items, 0 records in 385.297637 sec, idle for 1 hour(s)]

    I can also see a DELETE operation running in MySQL, but if I convert the clock time to normal, it's from 3 weeks ago (1671337088 = Sun Dec 18 2022 04:18:08), so Zabbix looks to still be sending the previous retention setting into the query.


    | 1335 | zabbix | localhost | zabbix | Query | 0 | query end | delete from history where itemid=200200000069556 and clock<1671337088 | 0.000 |

    Does anybody have any idea what the issue could be? I've tried multiple service restarts and server reboots, and even (as this is a sandboxed test copy of the box) and even an upgrade to 4.4.10 as this is part of the upgrade plan anyway, but nothing has seemed to have any effect.​
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2
    Check in Administration->General and then flip the non-obvious drop-down from "GUI" to "Housekeeping" and see if either History or Trends has "Override" checked.

    It's been a year or so since I looked at the housekeeper logic, but I believe there are some internal factors that influence how much it will delete in one go. You can't force it to exceed whatever those internal limits are, without modifying the code.

    MySQL/MariaDB InnoDB files, and really most relational database files, are sparse by design. If you want to shrink the on-disk footprint, you have to free space in the file and then do the OPTIMIZE you mentioned. You may need to have free space on the filesystem for 2x the size of the table for the OPTIMIZE to be able to complete.

    I've been where you are: needing to shrink history_uint after it had grown far bigger than we intended. I scripted running the housekeeper basically in a loop, so I was running it 10-12 times per hour, every hour. That ultimately still wasn't making enough progress, so Zabbix support suggested the following:
    1. create a new table, history_uint_new, with the exact same schema as history_uint
    2. (quickly) use MySQL/MariaDB RENAME or ALTER TABLE to effectively move 'history_uint' out of the way and move history_uint_new to history_uint. At that point, your history_uint table is completely empty.
    3. use a query to INSERT INTO the new history_uint values from history_uint_old that you want to keep, e.g. something newer than a particular clock.
    4. when you're comfortable that your new history_uint has the data you need, you just DROP the old table.
    How you best accomplish 2 and 3 depend upon what version of MySQL/MariaDB you have. I know that recent versions of MariaDB support a nice "RENAME TABLE" that's both atomic and allows you do the swap in one SQL statement, vs. the older ALTER TABLE method that requires 2 separate statements.​

    Comment

    • alexw-z
      Member
      • Dec 2021
      • 36

      #3
      Thanks for the reply. Yep, to confirm Housekeeping is enabled for both History and Trends, and Override is enabled for both.

      On the optimise front I think I would potentially run into issues with the large history_uint table with the amount of space available on the partition it lives on at present. I'm not sure if optimising these tables to free up the empty space would buy me that much time on the export front however, so it wasn't something I was planning to do as part of prep for the the migration work.

      It might be that I do need to resort to a manual delete. I did test one set of instructions for a delete I found elsewhere on these forums on a previous sandboxed copy and it seemed to break MySQL, but I'm aware of at last one other set of instructions for doing this I can try (in addition to the above suggestion). I try to avoid manual manipulation of databases if I can possibly avoid it however just as a matter of course.

      I was also having a hunt around to try to find the DB table that contains the Housekeeper retention settings to see if they'd updated at a DB level, the "housekeeper" table doesn't seem to be it however so still working on trying to track those down.

      Comment

      • tim.mooney
        Senior Member
        • Dec 2012
        • 1427

        #4
        Originally posted by alexw-z
        Thanks for the reply. Yep, to confirm Housekeeping is enabled for both History and Trends, and Override is enabled for both.
        I can't tell from your response if you expected the history and trend override to be enabled, but just so it's clear, if you've been adjusting the history for individual items, it's the "override" for history on the Housekeeping page that is causing those changes to be ignored. My expectation is that most sites do not run with the overrides enabled.

        I agree that there's not much point in an OPTIMIZE if you're exporting the data to a new system anyway. Mentioning it was more for other people that are going to find your post in the future, when they need to shrink a history-related table.

        I wholeheartedly agree about avoiding manual manipulation of the database.

        I'm not certain, but I expect that the housekeeper settings you're looking for are the hk_* columns in the config table.

        For your migration to the new system, are your current and new database versions close enough that you could just rsync the binary files and then run mysql_upgrade to make the binary files work with the target database version? That doesn't "clean up" any history bloat, but it's potentially faster than any other migration method.

        Comment

        • alexw-z
          Member
          • Dec 2021
          • 36

          #5

          I was expecting the Override to be enabled. I'm not certain of the server history as it was built my my predecessor, but I suspect Housekeeping settings have likely always been global. I've tried toggling this on and off anyway, and it doesn't make any difference either.

          The settings are in indeed in the config table, thanks for that. Playing around with these settings I can see they are changing in the config table

          hk_history_mode: 1
          hk_history_global: 1
          hk_history: 14d
          hk_trends_mode: 1
          hk_trends_global: 1
          hk_trends: 30d


          Interestingly if I do a Unixtime conversion on the clock value being used for the current deletion, this looks to be approx 18 days ago, so it's in between the original value of 30d and the current value of 14d:

          | 17608 | zabbix | localhost | zabbix | Query | 0 | updating | delete from history where itemid=200200000398706 and clock<1671884326 | 0.000 |
          Unixtime 1671884326
          Your Time Zone Sat Dec 24 2022 16:19:30 GMT+0000 (Greenwich Mean Time)
          Relative 18 days ago
          The DB on the source box is Maria 10.2, the target is 10.6. I'm not sure offhand what the chances are of being successful in jumping several versions, but I can certainly lab it out and seeing how it goes. I don't really care so much about the history bloat being there or not, as long as I can get the migration time down to ideally a window of <6 hours.

          Comment

          • cyber
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Dec 2006
            • 4807

            #6
            Minor version jumping (from .2 to .6) should not cause any trouble.

            Maybe after some days your housekeeper reaches your desired 14d? It just takes time . It has already gone from 30 to 18... just 4 days to go..:P

            Comment

            • alexw-z
              Member
              • Dec 2021
              • 36

              #7
              Had a very interesting development today. From previous visual checks of graphs I was fairly sure we had data going back 3 months (which tied in with the Trend retention setting), but I dug a bit more into the oldest clock times present in the history files, and found items with clock times going back 3 or 4 years. It looks like the Housekeeper may have lapsed in keeping up with it's backlog some time ago!

              I used tim.mooney​'s suggestion to dump 3 weeks worth of data from each history table into a new one, which took less than an hour to complete, and has reduced the real size on disk of the Zabbix database from 147GBytes to 26Gbytes. This has subsequently brought the mysqldump time down to 11mins and the re-import down to 55m.

              I'll keep a close eye on housekeeping effectiveness moving forwards, but this gets my projected timescale well within the 6 hour maintenance window I have to complete the work.

              Thanks again for the replies/help all, appreciated.

              Comment

              • prchalsefira
                Junior Member
                • Jan 2023
                • 5

                #8
                Hi

                I observe a similar problem with the housekeeper. History tables are getting progressively larger. Despite the fact that I have set the global history override to 7 days (7d), the minimum clock values in the tables are more than a year old.

                When using pg_activity to observe history deletion, the clock condition (in unixtime) for the deletion really shows values corresponding to last year, not 7 days ago. On the zabbix server the time is set correctly. My version is 5.0.31.

                What could confuse housekeeper that it doesn't delete values according to the override settings?

                I can supply further information if needed. Currently my history_uint table is 160GB including index.

                Thank you in advance for your reply.​

                Comment

                • alexw-z
                  Member
                  • Dec 2021
                  • 36

                  #9
                  Hi prchalsefira

                  You might be better creating a fresh Forum post in order to get more attention to your issue. However, I identified that the old backlog of data looked to relate to items that were no longer present in the items data. When the Housekeeper process runs the delete statements also include a WHERE clause that references an item ID, so I believe that data was being left behind from previously deleted items.

                  I've since upgraded to 6.0, I haven't checked yet to see if the problem has persisted. I wasn't able to find any definite evidence that this was a known bug, but I believe that's likely to be the case.

                  As above, I migrated the most recent history data I needed to new tables, swapped them in and deleted the old data, this saved me over 100GBytes of space.

                  Another fix would be to move to multi-partitioned history tables. As these work by dropping whole tables of older data, it wouldn't be dependent on existing item ids for deletion of old data.

                  Comment

                  • prchalsefira
                    Junior Member
                    • Jan 2023
                    • 5

                    #10
                    Hi,
                    Thank you for your reply.

                    I checked the orphaned history records and found none. (records in history_uint that have no corresponding itemid in the items table).

                    In the end, I think I'll do as you did, create a new history_* tables with a limited number of records and then observe if housekeeper works correctly.

                    I also consider this a bug, so I tried upgrading to 6.0 on the clone environment but it behaves the same.

                    I don't think my Zibbix installation is so large that I need to deploy partitioning. But I'll go that route if there's nothing left to do.

                    Best regards.​​​

                    Comment

                    • tim.mooney
                      Senior Member
                      • Dec 2012
                      • 1427

                      #11
                      Originally posted by alexw-z
                      the delete statements also include a WHERE clause that references an item ID, so I believe that data was being left behind from previously deleted items.
                      Interesting find. I can understand why they might want to do that, but if that's the case then there should also be a cleanup process for history data that is no longer associated with a valid itemid. Seems like a serious oversight.

                      Comment

                      • tim.mooney
                        Senior Member
                        • Dec 2012
                        • 1427

                        #12
                        Originally posted by prchalsefira
                        Despite the fact that I have set the global history override to 7 days (7d), the minimum clock values in the tables are more than a year old.
                        The housekeeper has an internal maximum for how much it will delete each time it runs. If you have many items that were set to keep stuff for a year or more and you've just recently set the global override to be 7d, then the housekeeper​ won't just get rid of everything older than 7d all at once.

                        How much is the housekeeper deleting each time it runs, and how much was it deleting before you set the global override to 7d?

                        Comment

                        • alexw-z
                          Member
                          • Dec 2021
                          • 36

                          #13
                          Originally posted by tim.mooney

                          Interesting find. I can understand why they might want to do that, but if that's the case then there should also be a cleanup process for history data that is no longer associated with a valid itemid. Seems like a serious oversight.
                          I'll do some further testing on it later today, see if the behaviour has persisted on 6.0, and report back. If it's still there I'll look into whether I can raise a bug for it, if one doesn't already exist.

                          Comment

                          • prchalsefira
                            Junior Member
                            • Jan 2023
                            • 5

                            #14
                            Originally posted by tim.mooney
                            How much is the housekeeper deleting each time it runs, and how much was it deleting before you set the global override to 7d?
                            I've had the 7d override set up for a long time, maybe months. At the same time I had MAXHOUSEKEEPERDELETE set to 5000 and the database performance was poor. Housekeeper was set to run every hour and ran for approximately 50 hours. I understand that this probably caused the database to grow.

                            I have increased the performance of the database server significantly, but there are still quite limited I/O operations. MAXHOUSEKEEPERDELETE is now set to 100,000 with a frequency of 20 hours and its output is as follows:
                            housekeeper [deleted 47696858 hist/trends, 67686040 items/triggers, 17747 events, 41 problems, 0 sessions, 0 alarms, 0 audits, 0 records in 189551.468383 sec, idle for 20 hour(s)]

                            Overal Zabbix server statistics:
                            Number of hosts (enabled/disabled): 217211 / 6
                            Number of items (enabled/disabled/not supported): 1231010366 / 842 / 1102
                            Number of triggers (enabled/disabled [problem/ok]): 17281443 / 285 [6 / 1437]
                            Required server performance, new values per second: 161.59​


                            So I may be well on my way to cleaning up the database. After the next housekeeper run I'll try to perform a VACUUM FULL on the history_* tables and see if the situation has improved. Alternatively, I'll try to create new tables and copy only values 10d old.

                            Thanks

                            Comment

                            • alexw-z
                              Member
                              • Dec 2021
                              • 36

                              #15
                              The oldest entries in my history tables are now a uniform 7 days ago, the issue is no longer occurring for me on 6.0.

                              Comment

                              Working...