Ad Widget

Collapse

Zabbix Housekeeper busy 4h+

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Burpees
    Junior Member
    • Nov 2017
    • 8

    #1

    Zabbix Housekeeper busy 4h+

    Hello,

    i have a problem with housekeeper. Houseekeper is 100% busy for more than 4 hours. A week earlier housekeeper ran just 40min. Then suddenly more than 4 hours.
    I browsed this forum and made couple of changes. Enabled innodb file per table, increased innodb buffer size, log size, io capacity etc.

    Zabbix and Mysql are running on one SSD.

    Config files are in attachment.

    Zabbix params:
    Hosts: 214
    Items: 17628
    Triggers: 2020
    NVPS: 237.74

    All items keeps history for 7 days and trends for 90 days. The shortest interval for data gathering is 60s.

    DB Size:
    history_uint - ~170 784 784 rows, 17,5GB
    history - ~117 511 276 rows, 11,2 GB

    Server params
    VMWare machine
    CPU: 3 Cores
    RAM: 32GB
    HDD: 150GB / SSD

    Thank You very much.
    Attached Files
    Last edited by Burpees; 26-11-2017, 08:50.
  • trikke76
    Member
    Zabbix Certified Trainer

    • Apr 2013
    • 42

    #2
    i would remove the mysql psw from your file


    MaxHousekeeperDelete=50 is to low standard is 5000 now keeping it low like 50 will resolve the time houskeeper issue but your db will not get cleaned up and so only get bigger

    it's always best to run the DB on its own disk or even better raid 10

    did the housekeeper went from 40 min to 4h after you made changes ?

    i'm no mysql expert but if housekeeper is running 1 hour then you are having performance issues

    in your case as it happend suddenly you need to find out what changed
    are you sure the disk is still ok ?

    have you added more items ?

    Comment

    • Burpees
      Junior Member
      • Nov 2017
      • 8

      #3
      Thank you

      Housekeeper went from 40min to 4h before i made those changes. I didn't add any items.
      I will check if SSD is still ok.

      Comment

      • trikke76
        Member
        Zabbix Certified Trainer

        • Apr 2013
        • 42

        #4
        another posibility could be that you have a fresh installation and that housekeeper is now just starting with the cleanup of the old data from the 90 day trends and that the installation is not able to cope with the I/O

        but this setup is not that big it seems rare to me. I never installed the whole installation on a single SSD it will also depend on the performance of you SSD maybe look at the specs of your disk what the max IOPS is and monitor your disc activity with Zabbix

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          You need to stop waste time on delete old data by delete queries and start use history and trends tables partitioning.
          Join the friendly and open Zabbix community on our forums and social media platforms.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • trikke76
            Member
            Zabbix Certified Trainer

            • Apr 2013
            • 42

            #6
            it's a solution but it will introduce other issue housekeeper does more then cleaning up history and trends

            partitioning is a good solution for large setups with heavy DB usage

            for 200 hosts it should not be a problem i run housekeeper on systems with 1500 hosts without issues

            Comment

            • Linwood
              Senior Member
              • Dec 2013
              • 398

              #7
              1) Review whether you need as many items as you are saving,

              2) Review whether you need the history (vs. trend) as long as you are keeping it,

              3) Review whether you need trends as long as you are keeping it for each item.

              Fundamentally, keep data that you need for research and that is actionable, anything more takes longer to delete.

              Beyond reducing volume, there's partitioning (as mentioned, it has issues, but is bar far the best big system solution, but requires good db knowledge).

              And finally, try max delete = 0. Emphasis on "try" and watch. I find it works nicely, and big deletes work much better than small chunks, but it also might hang things up while it happens, so use with that caveat. But generally "as big as does not cause other problems" is the best answer; small is very slow.

              Comment

              • kloczek
                Senior Member
                • Jun 2006
                • 1771

                #8
                Originally posted by trikke76
                it's a solution but it will introduce other issue housekeeper does more then cleaning up history and trends
                No, it will not.
                Partitioning is fully supported way maintaining history and trends data supported even under paid zabbix support.
                Simple instead wasting almost the same number of IOs on delete oldest data (about the same as it was spent on adding those data) it is about dropping (delete) oldest partitions and spending only few IOs per such operation.
                You just reached reached on your HW IO bandwidth on add new data + delete old data which. Try to think about using zabbix on the scale millions of items and triggers and tenths of K NVPS. Reaching such size is not possible on still using DELETE queries.

                Introduce partitioning will allow you as well drop avg number of IO/s as instead updating history and trends tabled b-tree is increasing with ln(n) when n is number of records held in single DB file. As you will be for example using daily history partitions which will be few tenths times smaller with your current system resources (CPU, memory, IO bandwidth) you will be able to reach few times bigger flow of the monitoring data than now (all without more resources).
                http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                https://kloczek.wordpress.com/
                zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                My zabbix templates https://github.com/kloczek/zabbix-templates

                Comment

                • trikke76
                  Member
                  Zabbix Certified Trainer

                  • Apr 2013
                  • 42

                  #9
                  yes it does

                  housekeeper does also cleaning :

                  - internal data
                  - discovery data
                  - auto registration data
                  - audit data
                  - .....


                  depending on the version it is possible to disable housekeeper for just trend and history data and keep it running for all the rest.

                  besides that following a guide is easy but like mentioned by Linwood you need good database knowledge for when things go wrong.

                  i agree that it has big advantages and dropping a table is certainly much quicker then runnin queries and esp for large setups probably even the only solution.


                  https://www.zabbix.com/documentation...tion/general?s[]=housekeeper
                  Last edited by trikke76; 27-11-2017, 21:44.

                  Comment

                  • kloczek
                    Senior Member
                    • Jun 2006
                    • 1771

                    #10
                    besides that following a guide is easy but like mentioned by Linwood you need good database knowledge for when things go wrong.
                    As long as it was possible to initialize partitioning and do first cycle of the I cannot imagine what possibly could go wrong after this.

                    i agree that it has big advantages and dropping a table is certainly much quicker then runnin queries and esp for large setups probably even the only solution.
                    I'm using HK with partitioning even on scale of my own laptop where I have full demo stack with server and proxy to be able to experiment with zabbix when I'm on the move or as demo platform when talking with someone about zabbix. Even on the scale about 1k items and less than 20 nvps I'm able to observe lower IO bandwidth than without partitioning.

                    Saving IO bandwidth is not only cons on use partitioned table.
                    Second reason is predictability of the disk space as on tables with highest flow of the data like history, trends, audit and events none of the old records are deleted and no gaps in physical files by this are created.
                    3rd possible reason to use partitioning is easier way to archiving oldest data by export instead drop partitions and import those old partitions on some other sql engine.

                    IMO partitioning should be standard OOTB method of the database content maintenance. I fully understand why it is still not introduces but I think that sooner or later it will be only method of the DB content HK.
                    As implementing HK over for example hourly created partitions on zabbix proxy on proxy_history table it could be kind of pay ground to implement it in the server as well.
                    It can be done even with sqlite backend create each hour new sqllite file for proxy_history table data.
                    Last edited by kloczek; 28-11-2017, 01:02.
                    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                    https://kloczek.wordpress.com/
                    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                    My zabbix templates https://github.com/kloczek/zabbix-templates

                    Comment

                    Working...