Ad Widget

Collapse

Zabbix housekeeper processes more than 75% busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mfemanuel
    Junior Member
    • Nov 2020
    • 2

    #1

    Zabbix housekeeper processes more than 75% busy

    Hi
    We have this alert on a regular base (several times a day) and it's getting annoying.

    Our relevant settings:
    Code:
    # HousekeepingFrequency=1
    # MaxHousekeeperDelete=5000
    We have adapted the trigger to average on 40 minutes:
    Code:
    {Template App Zabbix Proxy:zabbix[process,housekeeper,avg,busy].avg(40m)}>75
    Our Housekeping settings were adapted like this:
    • History - Data storage period: 14d
    • Trends - Data storage period: 365d
    • Events and alerts - Trigger data storage period 365d (all others are set to 1d)

    A normal log extract would be something like this:
    Code:
    housekeeper [deleted 831062 hist/trends, 0 items/triggers, 6 events, 1 problems, 0 sessions, 0 alarms, 0 audit items in 2294.853656 sec, idle for 1 hour(s)]
    The server is running on an SSD RAID 10 with two SSD's. We are fully aware that these delete statements are very I/O expensive and would set up database partitioning for the next update. That will though not be right now.

    My questions are:
    • what could we do better in order to keep the housekeeper below the threshold?
    • why is it only our four zabbix proxys reporting this alarm, but when I check on the zabbix server itself with the graph "Zabbix internal process busy %" it looks like the server itself is sleeping? Is the cleanup not executed on the zabbix server itself?
  • mfemanuel
    Junior Member
    • Nov 2020
    • 2

    #2
    No need to come back with input all at the same time
    Jokes aside, I could fix it by running the housekeeper process immediately after finishing the cleaning with a short line in crontab:
    Code:
    # crontab -e
    
    -- add the following line
    */5 * * * * /usr/sbin/zabbix_server -R housekeeper_execute
    This ran for four days straight and apparently only then enough records were cleaned so we got it to work within a 5-10min interval again (and the crontab line outcommented).

    Comment

    Working...