Ad Widget

Collapse

Housekeeper not keeping up

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • StefanK
    Junior Member
    • Jan 2023
    • 22

    #1

    Housekeeper not keeping up

    Hi All

    We have quite the large Zabbix server and we are having a issue where housekeeper is not cleaning up quick enough and the housekeeper process can take up to 12 hours per run.
    Attached is the utilisation graph.
    Any tips to speed up this process?Click image for larger version

Name:	image.png
Views:	358
Size:	187.2 KB
ID:	489020
  • MickeyPM
    Junior Member
    • Apr 2020
    • 13

    #2
    Hi tim.mooney
    I see you commented on a simlar issue earlier on. Would you have any advise on this ?

    Click image for larger version

Name:	image.png
Views:	321
Size:	24.8 KB
ID:	489203

    We are running:
    Linux Red Hat 8.8
    12 CPU's and 64GB ram
    PostgreSQL 13.
    PostgreSSQL and Web are on the same server

    We have even tried increasing Pollers, which I see is a bad idea.

    Config files values:
    VMwareCacheSize=128M
    CacheSize=8G
    HistoryCacheSize=1G
    HistoryIndexCacheSize=1G
    TrendCacheSize=1G
    ValueCacheSize=256M

    StartPollers=300
    StartAgentPollers=2
    MaxConcurrentChecksPerPoller=500
    StartIPMIPollers=1
    StartPollersUnreachable=20
    StartHistoryPollers=10
    StartTrappers=25
    StartVMwareCollectors=5
    StartSNMPTrapper=1
    Timeout=30
    UnreachablePeriod=60
    UnavailableDelay=60
    UnreachableDelay=60
    LogSlowQueries=60000
    StatsAllowedIP=127.0.0.1
    StartReportWriters=1
    WebServiceURL=http://localhost:10053/report

    Most of the Hosts are direct to the Zabbix Server, and 3 Zabbix Proxy servers are used for a few of hosts.​​

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4811

      #3
      MickeyPM , What is you issue? Same as topic starter? Housekeeper not doing its job? You should say so ... Currently none can understand your issue..

      Comment


      • MickeyPM
        MickeyPM commented
        Editing a comment
        Hi @cyber
        I was just expanding and giving more detail on the issue my collegue StefanK posted.

        The database recently grew to 1.2TB, and we dumped the database excluding the history and trends.
        The new database was 175GB and now a months later we are again on 500+GB and still growing.

        The housekeeper is just not keeping up.
        The fact that there is only a single housekeeper running that deletes sequentially seems to be the bottleneck.
        It seems we need a housekeeper that can be configured to spawn processes per template or host group or something like that.
    • jhboricua
      Senior Member
      • Dec 2021
      • 113

      #4
      StefanK You likely need to partition your database and setup a scheduled task to drop the old tables from the database instead of relying on the housekeeper process.

      If your Zabbix database is MySQL/MariaDB, see my Gitlab project on Zabbix MySQL Partitioning that contains a Python script to achieve this. Is is based on a Zabbix blog post on the subject.

      Comment

      • jhboricua
        Senior Member
        • Dec 2021
        • 113

        #5
        MickeyPM, How much history and trends are you keeping in the housekeeper configuration?

        It is well known that after you hit certain number of host/items/vps, the house keeper process starts to be impacted. Partitioning the database and manage those partitions in the DBMS is the only real solution. You're using PostgreSQL, so there are several options available for you to do that. Using Postgres with timescaledb has been supported by Zabbix for a while. It has the additional advantage of also being able to enable compression, which greatly reduces the database storage usage.

        Comment

        • MickeyPM
          Junior Member
          • Apr 2020
          • 13

          #6
          Just an update....
          We ultimately had to migrate the server to SSD as the 15k disks could not manage the throughput of the amount of new data and houskeeper processes.
          The housekeeper took nearly 3 months to clean up the "extra" data and our database is finally stable and number of records in the database are at a constant growth/purge rate.
          We still need to do a full vacuum to recover the excess space the database is taking up.
          Thanks for all the input.
          Last edited by MickeyPM; 21-11-2024, 10:33.

          Comment

          Working...