A year ago, I began encountering serious issues with our Zabbix installation. https://www.zabbix.com/forum/zabbix-...ed-suggestions
It took me almost a year to get approval to resolve, and in the meantime all I could do was increase CPU, RAM, and storage space. It grew to well past 3TB. Every few minutes the Zabbix Server process would crash and restart, thereby never allowing Housekeeper to run (30min delay upon service start). When it was running (before it started crashing every few minutes), it was configured with MaxHousekeeperDelete=300 and HousekeepingFrequency=1. Our NVPS sits at around 2000 with an across the board average retention period of 2 weeks. So even when it was running, it was never coming close to removing all of the old items it was supposed to.
My guess is that it was going on for close to 2 years in this manner. Each time the DB filled up, someone increased storage space without further investigation. Eventually the DB got so large and the entire setup far outstripped the boxes available hardware that the Zabbix Server process began to crash every few days, then every day, then every 5 minutes. At this point, housekeeper wasn't even getting a chance to run. The DB grew, and grew, and grew...
Ultimately I had to export everything in the DB to a dump except for historical related tables (alerts, history, trends, and housekeeper). Recreate the DB, restore the data/structure, and essentially start over with monitoring data. Hopefully this assists others that have issues.
It took me almost a year to get approval to resolve, and in the meantime all I could do was increase CPU, RAM, and storage space. It grew to well past 3TB. Every few minutes the Zabbix Server process would crash and restart, thereby never allowing Housekeeper to run (30min delay upon service start). When it was running (before it started crashing every few minutes), it was configured with MaxHousekeeperDelete=300 and HousekeepingFrequency=1. Our NVPS sits at around 2000 with an across the board average retention period of 2 weeks. So even when it was running, it was never coming close to removing all of the old items it was supposed to.
My guess is that it was going on for close to 2 years in this manner. Each time the DB filled up, someone increased storage space without further investigation. Eventually the DB got so large and the entire setup far outstripped the boxes available hardware that the Zabbix Server process began to crash every few days, then every day, then every 5 minutes. At this point, housekeeper wasn't even getting a chance to run. The DB grew, and grew, and grew...
Ultimately I had to export everything in the DB to a dump except for historical related tables (alerts, history, trends, and housekeeper). Recreate the DB, restore the data/structure, and essentially start over with monitoring data. Hopefully this assists others that have issues.
Comment