Ad Widget

Collapse

Housekeeper problems - manual cleanup?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • grep
    Junior Member
    • Mar 2009
    • 2

    #1

    Housekeeper problems - manual cleanup?

    We're having some problems with the housekeeper process not being able to keep up and I'm wondering if there is an easy way to "manually" do the housekeeping.

    Awhile back, the housekeeper was causing a tremendous amount of load on our DB server. At the time we were running zabbix 1.6.5 with postgres 8.3 on the backend, and it turned out that someone had done a mass update of several hosts, unlinking and re-linking some rather large templates. This caused the housekeeper to schedule a bunch of slow running deletes, purging a years worth of data for a 1,000 or more items.

    We disabled the housekeeper and eventually purged the housekeeper table manually, but here is where we went wrong - no one ever re-enabled the housekeeper. I just noticed the other day as part of my work to upgrade to 1.8.3. Our postgres DB is now over 400GB and the housekeeper table has 1,700 rows in it.

    I turned it back on, but after several days of a DB server load ranging from 20 to over 40, there is still 1,700 rows in the table so either the housekeeper isn't working or the process is just going to run forever. I tried starting the zabbix_server with no other workers (so no pollers, trappers, etc) to let just the housekeeper run, but it still doesn't seem to make any progress.

    I can purge data easily enough, but what about the trend data? Is there a way to do that part manually, or is that most likely what zabbix is choking on? If I just purge all the data older than, say 90 days (and the housekeeper table) will zabbix have any problems with that? Other ideas?

    Thanks,

    -poul
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #2
    I have a similar problem with our Oracle backend. It seems that the Housekeeper is just slow. We're thinking about going to a partitioned table schema so we can just drop old tables once they're outdated.

    Comment

    • qix
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2006
      • 423

      #3
      I havn't tried this, so consider it an experiment.
      Maybe you could use a one ore more 'spare' machines running zabbix server with ONLY the housekeeper enabled and let them talk to the database.

      The combined cpu power should sort it out eventually, provided the DB server can keep up.

      I'm not sure this is supported/dangerous though...
      With kind regards,

      Raymond

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #4
        It's probably not possible to do what you suggest.

        Far better to make a cron script that makes queries direct to the database.

        The question is, what exactly is the housekeeper doing? Can we do a blanket statement (in our case) if we're not keeping any history for longer than 30 days to simply purge anything from HISTORY_(fill in the blank) with a timestamp of SYSDATE-30(days) ? I'm sure the regular housekeeper does checks to see when items are set to expire and then only deletes those, but if we've configured all of our items to keep no more than 30 days, wouldn't the simple sql statement method save processor cycles (and then I could disable the housekeeper)?

        Comment

        • qix
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2006
          • 423

          #5
          We'll....the manual stated that load can be divided over multiple Zabbix servers.
          It was in the manual since at least version 1.1.3 (that was before DM) so I suspect you can run different Zabbix processes on multiple hosts that use the same DB...However, I couldn't find a reference to it in the current manual

          Personally, I wouldn't know what SQL queries to run.
          I'd be to much afraid to break the DB consistency
          Last edited by qix; 16-12-2010, 10:03.
          With kind regards,

          Raymond

          Comment

          • untergeek
            Senior Member
            Zabbix Certified Specialist
            • Jun 2009
            • 512

            #6
            I stand corrected. That notwithstanding, we are moving to disable housekeeping and put all of our items on a strict 30 day history/365 day trend map. We will then be doing some simpler sql statements to purge items older than 30 days and trends older than 365 days.

            Barring that, we're going to partitioned tables and dropping the old tables after they're older than a month, e.g. on December 1 you could delete October's table.

            Housekeeping is a nightmare for the database server if you're keeping items for a variety of time periods. Much less pain to keep all for a fixed period and do simplified cleanup.

            For example:

            Code:
            delete from alerts where clock<SYSDATE-365 days (must be unix timestamp)
            
            delete from history where clock < SYSDATE-31 days (must be unix timestamp value)
            delete from history_uint where clock < SYSDATE-31 days (must be unix timestamp value)
            delete from history_str where clock < SYSDATE-31 days (must be unix timestamp value)
            delete from history_text where clock < SYSDATE-31 days (must be unix timestamp value)
            delete from history_log where clock < SYSDATE-31 days (must be unix timestamp value)
            
            delete from trends where clock < SYSDATE-365 days (must be unix timestamp value)
            delete from trends_uint where clock < SYSDATE-365 days (must be unix timestamp value)
            And the tricky one is this one, as you need to select from one table and then use the results in 2 other queries:
            Code:
            select eventid from events where clock<SYSDATE-365 days (must be unix timestamp value)
            delete from acknowledges where eventid=(the above list)
            delete from events where eventid=(the above list)
            These were adapted from housekeeper.c in the src/zabbix_server/housekeeping source. I simply removed the code to figure out an item's "expiration" as we have it uniformly set. It should dramatically reduce the number of queries needed for housekeeping.

            Comment

            • qix
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Oct 2006
              • 423

              #7
              Good luck Untergeek, let us know how it goes...I'm fairly curious
              With kind regards,

              Raymond

              Comment

              • mushero
                Senior Member
                • May 2010
                • 101

                #8
                Direct purge query for history

                We've been experimenting with this query - it purges all trends older than the time for that item. You can run this daily, weekly, etc. or in phases (by limiting itemid or templates or hosts, etc. but you must join through item to trends, since its only index is on itemid).

                DELETE trends t
                FROM trends t JOIN items i ON i.itemid = t.itemid
                WHERE clock < (unix_timestamp() - trends * 3600 * 24)
                AND i.status = 0 /* Don't waste big trend lookups on useless items */

                You can add another criteria to only do a block at a time:
                WHERE i.itemid > 11111 AND i.itemid < 22222

                You also have to do table trends_uint to purge that.

                Comment

                • julian.fernandez
                  Junior Member
                  • Dec 2010
                  • 5

                  #9
                  New on zabbix and linux

                  Hello i need some help. I need delete the history and trends from zabbix´s database. I need the last 60 days of history and trends . The housekeeping is activate but i does not works !! no record has been deleted.
                  Sorry but if anybody has got an sql sentence i will appreciate your help.


                  Thanks to everybody !!!
                  Last edited by julian.fernandez; 01-04-2011, 11:23.

                  Comment

                  • mushero
                    Senior Member
                    • May 2010
                    • 101

                    #10
                    Julian,

                    See my query in the prior post that should work for trends (which are the big ones) - it's slow if you have lots of items, but should work to purge your data.

                    Also suggest you research why the main housekeeping system is not working for you, as it should.

                    Comment

                    • julian.fernandez
                      Junior Member
                      • Dec 2010
                      • 5

                      #11
                      Hello mushero the big ones tables on my system are history and history_uint. I was looking your sql-sentence but these not for history and history_uint tables.
                      May be this sentence could it be right... but i don't understand the "sysdate-365" in the field clock. Clock field is integer!!! any idea ?
                      Thanks in advance .

                      delete from alerts where clock<SYSDATE-365 days (must be unix timestamp)

                      delete from history where clock < SYSDATE-31 days (must be unix timestamp value)
                      delete from history_uint where clock < SYSDATE-31 days (must be unix timestamp value)
                      delete from history_str where clock < SYSDATE-31 days (must be unix timestamp value)
                      delete from history_text where clock < SYSDATE-31 days (must be unix timestamp value)
                      delete from history_log where clock < SYSDATE-31 days (must be unix timestamp value)

                      delete from trends where clock < SYSDATE-365 days (must be unix timestamp value)
                      delete from trends_uint where clock < SYSDATE-365 days (must be unix timestamp value)

                      Comment

                      • mushero
                        Senior Member
                        • May 2010
                        • 101

                        #12
                        Sorry, been a while and didn't look closely; here is something that works, I think, not only does some items at a time (<22250), and yes, can use for any trend/history table (there are many by type):

                        DELETE trends t
                        FROM trends t JOIN items i ON i.itemid = t.itemid
                        WHERE i.itemid < 22250 /* To do only some */
                        AND i.status = 0 /* Only Active ones */
                        AND clock < (unix_timestamp() - trends * 3600 * 24) /* Use configured trends length */
                        AND i.itemid > 0 /* Needed to get DB to use index on trends - not sure what this means*/

                        Comment

                        • julian.fernandez
                          Junior Member
                          • Dec 2010
                          • 5

                          #13
                          housekeeper not runs

                          I don't know what is going wrong on my zabbix server configuration. I have uncomment "DisableHousekeeping=0" and with this line uncomment ii should works !!! any Idea what could be wrong ?.

                          Thanks in advance

                          # This parameter must be between 0 and 255
                          #StartIPMIPollers=0

                          # Number of pre-forked instances of pollers for unreachable hosts
                          # Default value is 1
                          # This parameter must be between 0 and 255
                          #StartPollersUnreachable=1

                          # Number of pre-forked instances of trappers
                          # Default value is 5
                          # This parameter must be between 0 and 255
                          #StartTrappers=5

                          # Number of pre-forked instances of ICMP pingers
                          # Default value is 1
                          # This parameter must be between 0 and 255
                          #StartPingers=1

                          # Number of pre-forked instances of discoverers
                          # Default value is 1
                          # This parameter must be between 0 and 255
                          #StartDiscoverers=1

                          # Number of pre-forked instances of HTTP pollers
                          # Default value is 1
                          # This parameter must be between 0 and 255
                          #StartHTTPPollers=1

                          # Listen port for trapper. Default port number is 10051. This parameter
                          # must be between 1024 and 32767

                          #ListenPort=10051

                          # Source IP address for outgoing connections
                          #SourceIP=

                          # Listen interface for trapper. Trapper will listen on all network interfaces
                          # if this parameter is missing.

                          #ListenIP=127.0.0.1

                          # How often ZABBIX will perform housekeeping procedure
                          # (in hours)
                          # Default value is 1 hour
                          # Housekeeping is removing unnecessary information from
                          # tables history, alert, and alarms
                          # This parameter must be between 1 and 24

                          HousekeepingFrequency=1

                          # How often ZABBIX will try to send unsent alerts
                          # (in seconds)
                          # Default value is 30 seconds
                          SenderFrequency=30

                          # Uncomment this line to disable housekeeping procedure
                          DisableHousekeeping=0

                          # Specifies debug level
                          # 0 - debug is not created
                          # 1 - critical information
                          # 2 - error information
                          # 3 - warnings (default)
                          # 4 - for debugging (produces lots of information)

                          Comment

                          • untergeek
                            Senior Member
                            Zabbix Certified Specialist
                            • Jun 2009
                            • 512

                            #14
                            Originally posted by julian.fernandez
                            I don't know what is going wrong on my zabbix server configuration. I have uncomment "DisableHousekeeping=0" and with this line uncomment ii should works !!! any Idea what could be wrong ?.
                            Are you trying to enable or disable housekeeping? If you're trying to disable housekeeping, that should be set to 1 (true).

                            Comment

                            • julian.fernandez
                              Junior Member
                              • Dec 2010
                              • 5

                              #15
                              Thanks for your reply but i need to enable housekeeper but it does not work. I need to delete the history before last 60 days ? Any mysql script to run and do it manually ? Maybe i must to restart the zabbix server after have done modification in the configuration file ?
                              Thanks in advance
                              Last edited by julian.fernandez; 26-04-2011, 11:11.

                              Comment

                              Working...