Hi everyone!
I know that housekeeper-related questions have been discussed a lot of times but unfortunately I couldn't find any solutions for our case so any help will be much appreciated.
Situation is the following:
- Partitioning is configured for trends and history tables. No issues here.
- Internal housekeeper is disabled by setting HousekeepingFrequency=0 in config.
- At the moment we have a huge amount of data in events, event_recovery and problem tables (700.000.000+ records and 100GB+ data for these 3 tables in total).
Some investigations have been performed and we found out that most part of these records (more than 99%) are related to some internal trigger-related events because of some initial triggers misconfiguration.
This has been fixed and amount of daily added events and problems is now significantly decreased, but there's still a lot of old events left in the database.
We tested internal housekeeper on test server by running it manually from command line with events category selected in housekeeper GUI section and faced with the following issues:
- on the first run it cleaned up all records from problem table that have been resolved older than 1 day ago ignoring the threshold set in GUI for events. First execution was running longer than 10 hours. We'd like to have more control over it and split this process into several bunches somehow when we will run it in production to minimize the impact.
- only several hundreds/thousands records are being cleaned up from events table during each housekeeper execution and it's running quite long. Changing MaxHousekeeperDelete parameter from zabbix server config doesn't affect this anyhow because this parameter works only for history data of items that have been removed as I know.
Our goals are:
1. Perform the cleanup of all the internal events backlog from Zabbix DB.
2. Configure regular housekeeping of events.
I did some testing with the following query:
In this case events are cleared up much faster than by running internal housekeeper and this can be controlled by running multiple times with different timestamp intervals.
Records from event_recovery and problem tables are also removed in this using by foreign keys.
This also shouldn't affect events related to real problems raised by triggers that we need to keep for 1 year.
So, the main questions are:
- What is the best way to perform such cleanup for both cases (one-time cleanup of the backlog and regular housekeeping)? Using built-in housekeeper or some scheduled script with custom SQL queries?
In case of using built-in housekeeper:
1. How to make it cleaning up more events on each execution and run faster?
2. How to configure it to have more granularity on cleaning up problems table to avoid running it for 10+ hours?
In case of running custom queries:
1. Is my query above is fine to perform the events cleanup and will not cause any problems in future?
2. If no - then which queries need to be executed for events cleanup and are safe to run to be sure that we will not get any data inconsistency?
3. Does it make sense to keep internal events in the database for longer than 1 day?
Thanks everyone in advance!
I know that housekeeper-related questions have been discussed a lot of times but unfortunately I couldn't find any solutions for our case so any help will be much appreciated.
Situation is the following:
- Partitioning is configured for trends and history tables. No issues here.
- Internal housekeeper is disabled by setting HousekeepingFrequency=0 in config.
- At the moment we have a huge amount of data in events, event_recovery and problem tables (700.000.000+ records and 100GB+ data for these 3 tables in total).
Some investigations have been performed and we found out that most part of these records (more than 99%) are related to some internal trigger-related events because of some initial triggers misconfiguration.
This has been fixed and amount of daily added events and problems is now significantly decreased, but there's still a lot of old events left in the database.
We tested internal housekeeper on test server by running it manually from command line with events category selected in housekeeper GUI section and faced with the following issues:
- on the first run it cleaned up all records from problem table that have been resolved older than 1 day ago ignoring the threshold set in GUI for events. First execution was running longer than 10 hours. We'd like to have more control over it and split this process into several bunches somehow when we will run it in production to minimize the impact.
- only several hundreds/thousands records are being cleaned up from events table during each housekeeper execution and it's running quite long. Changing MaxHousekeeperDelete parameter from zabbix server config doesn't affect this anyhow because this parameter works only for history data of items that have been removed as I know.
Our goals are:
1. Perform the cleanup of all the internal events backlog from Zabbix DB.
2. Configure regular housekeeping of events.
I did some testing with the following query:
delete from events where source=3 and object=0 and clock <[some_timestamp]
Records from event_recovery and problem tables are also removed in this using by foreign keys.
This also shouldn't affect events related to real problems raised by triggers that we need to keep for 1 year.
So, the main questions are:
- What is the best way to perform such cleanup for both cases (one-time cleanup of the backlog and regular housekeeping)? Using built-in housekeeper or some scheduled script with custom SQL queries?
In case of using built-in housekeeper:
1. How to make it cleaning up more events on each execution and run faster?
2. How to configure it to have more granularity on cleaning up problems table to avoid running it for 10+ hours?
In case of running custom queries:
1. Is my query above is fine to perform the events cleanup and will not cause any problems in future?
2. If no - then which queries need to be executed for events cleanup and are safe to run to be sure that we will not get any data inconsistency?
3. Does it make sense to keep internal events in the database for longer than 1 day?
Thanks everyone in advance!
Comment