Ad Widget

Collapse

Utilization of Housekeeper processes over 75%

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Jason-TELUS
    Junior Member
    • Jan 2024
    • 9

    #1

    Utilization of Housekeeper processes over 75%

    Hello everyone,

    I'm confused by this alarm that comes in and clears on our system. looking through other discussion of this alarm has not presented me with a clear way forward. Some background information to start with, We have a docker container running the Zabbix server monitoring 7500+ hosts, processing ~225 values per second. The VM this set of docker containers runs on has 4 CPUs 16 GB of memory and 350GB of HDD space (67% space utilization). The DB is postgres without Timescale with no partitioning.

    Should note: Zabbix 6.0

    Some items we do not keep a history/trend for and other items in our templates we want a history and trend

    The housekeeper settings I have found are commented out, so I assume this process is running with the defaultconfiguration:

    # ZBX_HOUSEKEEPINGFREQUENCY=1
    # ZBX_MAXHOUSEKEEPERDELETE=5000
    # ZBX_PROBLEMHOUSEKEEPINGFREQUENCY=60 # Available since 6.0.0

    In the console logs for the docker container filtering for housekeeper messages I get the following series of logs:

    228:20240505:183845.418 executing housekeeper
    228:20240505:185118.586 housekeeper [deleted 1067398 hist/trends, 0 items/triggers, 190 events, 101 problems, 0 sessions, 0 alarms, 419 audit, 0 records in 753.150501 sec, idle for 1 hour(s)]
    228:20240505:195118.829 executing housekeeper
    228:20240505:200403.568 housekeeper [deleted 1084587 hist/trends, 0 items/triggers, 226 events, 111 problems, 0 sessions, 0 alarms, 423 audit, 0 records in 764.715343 sec, idle for 1 hour(s)]
    228:20240505:210403.688 executing housekeeper
    228:20240505:211637.206 housekeeper [deleted 1220714 hist/trends, 0 items/triggers, 190 events, 123 problems, 1 sessions, 0 alarms, 427 audit, 0 records in 753.510739 sec, idle for 1 hour(s)]
    228:20240505:221637.315 executing housekeeper
    228:20240505:222812.405 housekeeper [deleted 1089658 hist/trends, 0 items/triggers, 192 events, 123 problems, 0 sessions, 0 alarms, 421 audit, 0 records in 695.079947 sec, idle for 1 hour(s)]
    228:20240505:232812.775 executing housekeeper
    228:20240505:233945.800 housekeeper [deleted 1070700 hist/trends, 0 items/triggers, 208 events, 103 problems, 0 sessions, 0 alarms, 415 audit, 0 records in 692.984654 sec, idle for 1 hour(s)]
    228:20240506:003945.912 executing housekeeper
    228:20240506:005138.805 housekeeper [deleted 1088674 hist/trends, 0 items/triggers, 191 events, 106 problems, 0 sessions, 0 alarms, 414 audit, 0 records in 712.853857 sec, idle for 1 hour(s)]
    228:20240506:015138.988 executing housekeeper
    228:20240506:020622.184 housekeeper [deleted 1410367 hist/trends, 0 items/triggers, 189 events, 171 problems, 0 sessions, 0 alarms, 577 audit, 0 records in 883.179396 sec, idle for 1 hour(s)]
    228:20240506:030622.342 executing housekeeper
    228:20240506:031846.211 housekeeper [deleted 1276286 hist/trends, 0 items/triggers, 201 events, 124 problems, 0 sessions, 0 alarms, 489 audit, 0 records in 743.849082 sec, idle for 1 hour(s)]
    228:20240506:041846.496 executing housekeeper
    228:20240506:043009.738 housekeeper [deleted 1090764 hist/trends, 0 items/triggers, 196 events, 90 problems, 0 sessions, 0 alarms, 420 audit, 0 records in 683.220500 sec, idle for 1 hour(s)]
    228:20240506:053009.969 executing housekeeper
    228:20240506:054120.307 housekeeper [deleted 1068219 hist/trends, 0 items/triggers, 181 events, 93 problems, 0 sessions, 0 alarms, 414 audit, 0 records in 670.329319 sec, idle for 1 hour(s)]
    228:20240506:064120.512 executing housekeeper
    228:20240506:065247.412 housekeeper [deleted 1065975 hist/trends, 0 items/triggers, 177 events, 82 problems, 0 sessions, 0 alarms, 413 audit, 0 records in 686.888725 sec, idle for 1 hour(s)]

    this seems like the housekeeper process is executing rapidly. I believe I have to adjust the settings in the config for housekeeping and then restart my docker container. however I'm not able to figure out which direction to adjust in. What should my first steps be towards resolving this issue? I've also heard I should be changing to a DB with partitions or enabling timescaledb, however I'd like to look at the settings that seem to be commented out first.

    Thanks in advance for any help provided

    regards,

    Jason
    Last edited by Jason-TELUS; 07-05-2024, 17:19.
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2




    All these parameters are explained in docs...
    Your alarm is caused by housekeeper running time... running over 10 minutes... But seems it does its job.. Is it all within that little VM? Or is your DB separate?

    If you switch to partitions and timescale, your housekeeper no longer needs to clean up old data, it will be dropped by DB automatically.. then your houskeeper stats may look like this
    Code:
    /usr/sbin/zabbix_server: housekeeper [deleted 0 hist/trends, 0 items/triggers, 2491 events, 8 sessions, 0 alarms, 1947 audit items, 0 autoreg_host, 0 records in 10.692918 sec, idle for 1 hour(s)]

    Comment

    • Jason-TELUS
      Junior Member
      • Jan 2024
      • 9

      #3
      Thanks for the information, I have some reading to do. everything is running in the VM, the Docker containers have a separate Docker containers::

      Containers running:
      zabbix-web-nginx-pgsql <- web front end
      zabbix-server <- zabbix server
      zabbix-snmptraps <- not sure what this one does, but we don't use snmptraps
      zabbix-agent <- monitoring agent
      postgres-server <- DB

      with maxhousekeeperdelete=5000 the default amount, each log entry looks like it's finding 1M+ entries to be deleted, does this mean it's falling behind? does this VM need more CPUs to allow for a larger maxhousekeeperdelete value?

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        it reports how much it already deleted... so no issues, no need to change the value... I just think you have too little power for that DB. queries run too long... DB performance is most important thing for Zabbix.

        Comment

        Working...