Ad Widget

Collapse

[Help]Zabbix housekeeper processes more than 75% busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Navern
    Member
    • May 2013
    • 33

    #1

    [Help]Zabbix housekeeper processes more than 75% busy

    Hi all,

    I have this issue zabbix housekeeper process every 1,5 hour is loaded heavily to 100%. It doesn't impact overall performance but bothers me. I've got no idea how to debug this issue. Could please someone help?

    Should i adjust values bellow?

    in zabbix_server.conf:
    HousekeepingFrequency=1
    MaxHousekeeperDelete=500
  • bobyboy
    Junior Member
    • Nov 2012
    • 25

    #2
    Originally posted by Navern
    Hi all,

    I have this issue zabbix housekeeper process every 1,5 hour is loaded heavily to 100%. It doesn't impact overall performance but bothers me. I've got no idea how to debug this issue. Could please someone help?

    Should i adjust values bellow?

    in zabbix_server.conf:
    HousekeepingFrequency=1
    MaxHousekeeperDelete=500
    Hi,
    Try with low MaxHousekeeperDelete.
    You can test MaxHousekeeperDelete=100 for begin.

    Comment

    • Navern
      Member
      • May 2013
      • 33

      #3
      Originally posted by bobyboy
      Hi,
      Try with low MaxHousekeeperDelete.
      You can test MaxHousekeeperDelete=100 for begin.
      Could you please make me understand: MaxHousekeeperDelete is a variable that means how many values at maximum will be deleted with single housekeeper run, am i right?

      Comment

      • Navern
        Member
        • May 2013
        • 33

        #4
        grep "housekeeper" zabbix_server.log:
        1344:20131124:104753.960 executing housekeeper
        1344:20131124:111658.782 housekeeper deleted: 175526 records from history and trends, 0 records of deleted items, 175 events, 0 alerts, 0 sessions
        1344:20131124:121659.057 executing housekeeper
        1344:20131124:124536.088 housekeeper deleted: 182336 records from history and trends, 0 records of deleted items, 181 events, 0 alerts, 0 sessions
        1344:20131124:134536.342 executing housekeeper
        1344:20131124:141255.380 housekeeper deleted: 179016 records from history and trends, 0 records of deleted items, 176 events, 0 alerts, 0 sessions
        1344:20131124:151255.647 executing housekeeper
        1344:20131124:155638.981 housekeeper deleted: 173487 records from history and trends, 0 records of deleted items, 174 events, 0 alerts, 0 sessions
        1344:20131124:165639.231 executing housekeeper
        1344:20131124:175225.445 housekeeper deleted: 211218 records from history and trends, 0 records of deleted items, 208 events, 0 alerts, 0 sessions
        1344:20131124:185225.718 executing housekeeper
        1344:20131124:195118.563 housekeeper deleted: 236710 records from history and trends, 0 records of deleted items, 232 events, 0 alerts, 0 sessions
        1344:20131124:205118.809 executing housekeeper
        1344:20131124:215211.230 housekeeper deleted: 242652 records from history and trends, 0 records of deleted items, 241 events, 0 alerts, 0 sessions
        1344:20131124:225211.486 executing housekeeper
        1344:20131125:000326.579 housekeeper deleted: 246727 records from history and trends, 0 records of deleted items, 241 events, 0 alerts, 0 sessions
        1344:20131125:010326.832 executing housekeeper
        1344:20131125:021037.428 housekeeper deleted: 268913 records from history and trends, 0 records of deleted items, 261 events, 0 alerts, 0 sessions
        1344:20131125:031037.682 executing housekeeper
        1344:20131125:041548.682 housekeeper deleted: 259535 records from history and trends, 0 records of deleted items, 261 events, 0 alerts, 0 sessions
        1344:20131125:051548.932 executing housekeeper
        1344:20131125:062120.729 housekeeper deleted: 253951 records from history and trends, 0 records of deleted items, 253 events, 0 alerts, 0 sessions
        1344:20131125:072121.004 executing housekeeper
        1344:20131125:083058.396 housekeeper deleted: 256212 records from history and trends, 0 records of deleted items, 251 events, 0 alerts, 0 sessions
        1344:20131125:093058.656 executing housekeeper
        1344:20131125:105508.711 housekeeper deleted: 264363 records from history and trends, 0 records of deleted items, 260 events, 0 alerts, 0 sessions
        1344:20131125:115508.970 executing housekeeper
        1344:20131125:131919.205 housekeeper deleted: 293672 records from history and trends, 0 records of deleted items, 288 events, 0 alerts, 0 sessions
        1344:20131125:141919.453 executing housekeeper
        1344:20131125:154512.536 housekeeper deleted: 295273 records from history and trends, 0 records of deleted items, 288 events, 0 alerts, 0 sessions
        1344:20131125:164512.802 executing housekeeper

        Comment

        • bobyboy
          Junior Member
          • Nov 2012
          • 25

          #5
          Now your zabbix housekeeper process is ok ?

          In documentation :
          ### Option: MaxHousekeeperDelete
          # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
          # [housekeeperid], [tablename], [field], [value].
          # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
          # will be deleted per one task in one housekeeping cycle.
          # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
          # If set to 0 then no limit is used at all. In this case you must know what you are doing!

          Comment

          • Navern
            Member
            • May 2013
            • 33

            #6
            Originally posted by bobyboy
            Now your zabbix housekeeper process is ok ?

            In documentation :
            ### Option: MaxHousekeeperDelete
            # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
            # [housekeeperid], [tablename], [field], [value].
            # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
            # will be deleted per one task in one housekeeping cycle.
            # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
            # If set to 0 then no limit is used at all. In this case you must know what you are doing!
            I know, i've checked documentation. It just doesn't clear enough to me because there are plenty of contradicting information on zabbix forum.


            I can't change this value before i have understood it completely.

            I've got one more alert: Zabbix history syncer processes more than 75% busy.

            I believe it's connected to housekeeper issue. Where i can read more deeply about this?

            ADDED:
            I also don't understand value of 100%. Does this CPU percent? or amount of zabbix processes?
            Last edited by Navern; 25-11-2013, 15:30. Reason: added one question

            Comment

            • kloczek
              Senior Member
              • Jun 2006
              • 1771

              #7
              Originally posted by Navern
              grep "housekeeper" zabbix_server.log:
              1344:20131124:104753.960 executing housekeeper
              1344:20131124:111658.782 housekeeper deleted: 175526 records from history and trends, 0 records of deleted items, 175 events, 0 alerts, 0 sessions
              That is fine.
              Zabbix server housekeeper is doing all deletes in few stages:
              - in first is deleting from history* and trends* tables using clock key and it deletes ALL data from items older than specified in "Keep history" param,
              - in second stage is deleting rows of items of deleted items and deleted hosts (zabbix does not deletes all these data just when you click on delete but it adds all these items ids to 'housekeeper' table).
              - at the end it deletes items from events, acknowledgements, alarms tables

              Limit specified in MaxHousekeeperDelete is used only in second stage.
              You have "0 records of deleted items" because before you had empty housekeeper table (you did not delete anything last time).
              http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
              https://kloczek.wordpress.com/
              zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
              My zabbix templates https://github.com/kloczek/zabbix-templates

              Comment

              • Navern
                Member
                • May 2013
                • 33

                #8
                Originally posted by kloczek
                That is fine.
                Zabbix server housekeeper is doing all deletes in few stages:
                - in first is deleting from history* and trends* tables using clock key and it deletes ALL data from items older than specified in "Keep history" param,
                - in second stage is deleting rows of items of deleted items and deleted hosts (zabbix does not deletes all these data just when you click on delete but it adds all these items ids to 'housekeeper' table).
                - at the end it deletes items from events, acknowledgements, alarms tables

                Limit specified in MaxHousekeeperDelete is used only in second stage.
                You have "0 records of deleted items" because before you had empty housekeeper table (you did not delete anything last time).
                Interesting information. So this value impacts only data which was deleted manually via GUI?

                Could you please link to me part of documentation where i can read more about housekeeper and how it deletes data?

                Comment

                • kloczek
                  Senior Member
                  • Jun 2006
                  • 1771

                  #9
                  Originally posted by Navern
                  Interesting information. So this value impacts only data which was deleted manually via GUI?

                  Could you please link to me part of documentation where i can read more about housekeeper and how it deletes data?
                  There is no documentation about this .. "use source Luke"

                  In zabbix 1.8/2.0/2.2 in source code in src/zabbix_server/housekeeper/housekeeper.c at the end you have main_housekeeper_loop() and in this function is called housekeeping_cleanup() (which is in the same file) and only in this function is used CONFIG_MAX_HOUSEKEEPER_DELETE.

                  All HK deletes are done as sequential delete queries (in single housekeeper subprocess). Housekeeper process is started from src/zabbix_server/server.c

                  Code:
                          else if (server_num <= (server_count += CONFIG_HOUSEKEEPER_FORKS))
                          {
                                  INIT_SERVER(ZBX_PROCESS_TYPE_HOUSEKEEPER, CONFIG_HOUSEKEEPER_FORKS);
                  
                                  main_housekeeper_loop();
                          }
                  but CONFIG_HOUSEKEEPER_FORKS is always 1 and main_housekeeper_loop() code is not prepared to start multiple/parallel main_housekeeper_loop()

                  BTW looks like it is few cases when some delete operations in GUI do not causes putting delete tasks of unused items from history* and trends* tables in housekeeper table. For example when someone waiting on long delete before php code will add new rows to housekeeper table will reload page or when someone will rename host Name.

                  In cases like this only way removing these lost rows is start partitioning tables. Usually when you realize that most of your history* and trends* tables contains some garbage you are doomed. Why? Because your env with monitored boxes is big enough to make whole initial partitioning very long (few hours or more) only to ALTER tables to move all data to initial partition. If at this point additionally you have no enough disk space to store whole copy of biggest table you are doomed .. without HW upgrade -> more storage space.
                  IMO 1: All zabbix postgresql/mysql tables should be listed in default/dist schemas with initial partition. This change should be added ASAP and will not affect anything.

                  IMO 2: current housekeeping zabbix code should be removed and/or adapted to use ONLY partitioned tables (in case psql/mysql .. I don't know how it is in case DB2 and Oracle but probably here is the same problem).

                  Current HK has yet another very painful side effect on proxies because all HK on proxy is done in single delete query using clock key on proxy_history table. If you have enough number of boxes/items behind the proxy housekeeper will be your MAIN bottleneck., On first stage you will be forces to use only shortest possible period of time keeping history on proxy. On next stage probably you will start looking for partitioning on hourly bases proxy_history table.

                  Using partitioning and handle the delete of old data straight from server/proxy code by dropping oldest partition would make zabbix much simpler and much better from scaleability point of view.

                  It is yet another argument "pro" using partitions. Deleting rows from tables in some scenarios when you are using MySQL makes even bigger database files and DB files are not shrinking. If you gained stage where you realized that your DB is only growing and you have less and less free disk space and you cannot schedule longer zabbix downtime to optimize tables only way will be setup slave MySQL and optimize tables on slave with time to time swapping between master and slave.

                  Switching zabbix to partitioning would make zabbix much more predictable (from point of view of storage capacity planning as well).

                  In big zabbix envs housekeeper cycle takes hours or days. But .. dropping oldest partitions takes less than minute. From this point of view current whole HK code it is biiiig waste of disk space, IOs and power.

                  PS. I'm on the stage where history*/trends* tables contains more then 50% of garbage (three month ago I've started cleaning mess with more than two years old zabbix 1.8 setup in quite big env). Today I'm going to swap between master and slave DB to use just optimized files tables

                  Anyone knows is it possible to apply initial partitioning on slave and continue syncing data from master? .. to swap after this between master and slave with minimal DB backend downtime (I'm only guessing so far that answer is negative).
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment

                  • Navern
                    Member
                    • May 2013
                    • 33

                    #10
                    Wow, it's amazing. Thanks for the answer. It helped me to understand more(i will dig into source code myself then).

                    It's a pity that there is no instant solution for this housekeeper issue like "increase amount of pollers".

                    I have only one question left: what does "housekeeper process 100% busy" means? How zabbix clarify that it's 100% and 100% of what value?

                    I will look into code myself for the answer but maybe someone knows

                    Comment

                    • kloczek
                      Senior Member
                      • Jun 2006
                      • 1771

                      #11
                      Originally posted by Navern
                      I have only one question left: what does "housekeeper process 100% busy" means? How zabbix clarify that it's 100% and 100% of what value?
                      Because it is always only one HK process 100% busy means that HK is active and is doing his normal job

                      If it is very low number of items/hosts HK cycle will be very short. Next cycle will begin in HousekeepingFrequency hours so this is why it is possible to observe for example value lower than 100%.
                      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                      https://kloczek.wordpress.com/
                      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                      My zabbix templates https://github.com/kloczek/zabbix-templates

                      Comment

                      • tchjts1
                        Senior Member
                        • May 2008
                        • 1605

                        #12
                        Originally posted by Navern
                        I have only one question left: what does "housekeeper process 100% busy" means? How zabbix clarify that it's 100% and 100% of what value?
                        On a sidenote, in case you haven't found the graphs for Zabbix internal items yet, they are part of Template App Zabbix Server and can really help you optimize your configurations.

                        Here is a screenshot of the graphs of a one day period for my setup. If you've already found them, that's cool. I think a fair number of folks don't even know they exist, or don't bother to use the information in them to their benefit.
                        Attached Files

                        Comment

                        • Navern
                          Member
                          • May 2013
                          • 33

                          #13
                          Originally posted by tchjts1
                          On a sidenote, in case you haven't found the graphs for Zabbix internal items yet, they are part of Template App Zabbix Server and can really help you optimize your configurations.

                          Here is a screenshot of the graphs of a one day period for my setup. If you've already found them, that's cool. I think a fair number of folks don't even know they exist, or don't bother to use the information in them to their benefit.
                          Hi, thanks for the note, i have found this graphs already and they did help me, yes, but problem still wasn't solved.

                          Currently my graph:
                          Click image for larger version

Name:	zabbix_internal_load.jpg
Views:	1
Size:	87.1 KB
ID:	312401

                          Comment

                          • tchjts1
                            Senior Member
                            • May 2008
                            • 1605

                            #14
                            What about the other 2 graphs like I show in my above post?
                            Looking at the graph you provided, you need to do some tuning and selectively allocate some more resources to your Zabbix cache settings, and I would imagine also to your pollers.

                            While these settings won't stop housekeeper from hitting 100%, you will still see a smoother running Zabbix setup.

                            (Edit) It also appears that your housekeeper process is running excessively long. In some cases on the above graph, for 2 hours straight.
                            For me, I found the ideal settings to leave housekeeper parameters at default settings and run it once per hour. In that case, it runs for about 10 minutes every hour.
                            Last edited by tchjts1; 27-11-2013, 16:28.

                            Comment

                            • pdwalker
                              Senior Member
                              • Dec 2005
                              • 166

                              #15
                              Originally posted by kloczek
                              "use source Luke"

                              In zabbix 1.8/2.0/2.2 in source code in
                              *bump*

                              I was searching for "zabbix housekeeper meaning of records of deleted items" and I found your reply answered my question.

                              thanks.

                              - Paul

                              Comment

                              Working...