Ad Widget

Collapse

PATCH: reduce server load during houskeeper runs

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Tom Hutter
    Junior Member
    • Feb 2012
    • 7

    #1

    PATCH: reduce server load during houskeeper runs

    Hi everybody,

    I recently switched my database from MySQL to PostgreSQL and suffered of heavy load during housekeeper runs since then. Seems MySQL is much better in deleting rows than PostgreSQL.

    During every housekeeper run I experienced a heavy load peek, manly generated by disk IO and my monitoring (zabbix - don't know if you guys know this ;-) ) went red. I have to mention, that my zabbix server is running in a VM and the server disks surely are not the fastest. I looked up the code of houskeeper and found out that houskeeper does a lot of deletes in every run, especially in "housekeeping_history_and_trends". I decided to give the server some rest after deleting a bunch of rows, before deleting the next. By this I can kind of flatten the load during housekeeper runs. The housekeeper runs are taking longer and the load peek is less high.

    There are two new configurable variables which you can add to "/etc/zabbix/zabbix_server.conf":

    # threshold, how many deletes before houskeeper sleeps
    HistoryAndTrendsDeleteBeforeSleep=1000

    # seconds of sleep between deletes
    HistoryAndTrendsSleepBetweenDeletes=10

    Default they both are set to 0, which disables the sleeps and restores the currently existing behaviour.

    During housekeeper runs you then see lines like this in zabbix_server.log (log level WARNING (3)):

    deleted 1002 > 1000 entries, housekeeper is sleeping 10 sec

    With sleeping 10 seconds after more than 1000 deletes I can keep my server load below 1. With sleeping 10 seconds after more than 3000 deletes, the load is about 1.5. I haven't tested more numbers, as I am fine with 1000/10.

    I don't have to take care, that a housekeeper run will take so long that another would be started and they collide. Housekeeper are running in an endless for loop and wait afterwards the time configured with CONFIG_HOUSEKEEPING_FREQUENCY.

    And finally the patch:

    diff --git a/src/zabbix_server/housekeeper/housekeeper.c src/zabbix_server/housekeeper/housekeeper.c
    index e021eaa..aade82a 100644
    --- a/src/zabbix_server/housekeeper/housekeeper.c
    +++ src/zabbix_server/housekeeper/housekeeper.c
    @@ -319,7 +319,9 @@ static int housekeeping_history_and_trends(int now)
    DB_RESULT result;
    DB_ROW row;

    - int deleted = 0;
    + long deleted = 0;
    + long count = 0;
    + long deleted_since_sleep = 0;

    zabbix_log( LOG_LEVEL_DEBUG, "In housekeeping_history_and_trends(%d)",
    now);
    @@ -332,13 +334,26 @@ static int housekeeping_history_and_trends(int now)
    item.history=atoi(row[1]);
    item.trends=atoi(row[2]);

    - deleted += delete_history("history", item.itemid, item.history, now);
    - deleted += delete_history("history_uint", item.itemid, item.history, now);
    - deleted += delete_history("history_str", item.itemid, item.history, now);
    - deleted += delete_history("history_text", item.itemid, item.history, now);
    - deleted += delete_history("history_log", item.itemid, item.history, now);
    - deleted += delete_history("trends", item.itemid, item.trends, now);
    - deleted += delete_history("trends_uint", item.itemid, item.trends, now);
    + count = delete_history("history", item.itemid, item.history, now);
    + count += delete_history("history_uint", item.itemid, item.history, now);
    + count += delete_history("history_str", item.itemid, item.history, now);
    + count += delete_history("history_text", item.itemid, item.history, now);
    + count += delete_history("history_log", item.itemid, item.history, now);
    + count += delete_history("trends", item.itemid, item.trends, now);
    + count += delete_history("trends_uint", item.itemid, item.trends, now);
    + deleted += count;
    + deleted_since_sleep += count;
    + if( CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP > 0 &&
    + deleted_since_sleep >= CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP )
    + {
    + zabbix_log( LOG_LEVEL_WARNING, "deleted %d > %d entries, housekeeper is sleeping %d sec",
    + deleted_since_sleep,
    + CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP,
    + CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES
    + );
    + sleep(CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELE TES);
    + deleted_since_sleep = 0;
    + }
    }
    DBfree_result(result);
    return deleted;
    diff --git a/src/zabbix_server/housekeeper/housekeeper.h src/zabbix_server/housekeeper/housekeeper.h
    index 64f3be8..9523a5e 100644
    --- a/src/zabbix_server/housekeeper/housekeeper.h
    +++ src/zabbix_server/housekeeper/housekeeper.h
    @@ -23,6 +23,8 @@
    extern int CONFIG_DISABLE_HOUSEKEEPING;
    extern int CONFIG_HOUSEKEEPING_FREQUENCY;
    extern int CONFIG_MAX_HOUSEKEEPER_DELETE;
    +extern int CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP;
    +extern int CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES;

    int main_housekeeper_loop();

    diff --git a/src/zabbix_server/server.c src/zabbix_server/server.c
    index ea8baec..b4b6c2a 100644
    --- a/src/zabbix_server/server.c
    +++ src/zabbix_server/server.c
    @@ -130,6 +130,8 @@ int CONFIG_TRAPPER_TIMEOUT = ZABBIX_TRAPPER_TIMEOUT;
    /*int CONFIG_NOTIMEWAIT =0;*/
    int CONFIG_HOUSEKEEPING_FREQUENCY = 1;
    int CONFIG_MAX_HOUSEKEEPER_DELETE = 500; /* applies for every separate field value */
    +int CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP = 0;
    +int CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES = 0;
    int CONFIG_SENDER_FREQUENCY = 30;
    int CONFIG_DBSYNCER_FORKS = 1;
    int CONFIG_DBSYNCER_FREQUENCY = 5;
    @@ -217,6 +219,8 @@ void init_config(void)
    {"CacheUpdateFrequency",&CONFIG_DBCONFIG_FREQUENCY ,0,TYPE_INT,PARM_OPT,1,3600},
    {"HousekeepingFrequency",&CONFIG_HOUSEKEEPING_FREQ UENCY,0,TYPE_INT,PARM_OPT,1,24},
    {"MaxHousekeeperDelete",&CONFIG_MAX_HOUSEKEEPER_DE LETE,0,TYPE_INT,PARM_OPT,0,1000000},
    + {"HistoryAndTrendsDeleteBeforeSleep",&CONFIG_HISTO RY_AND_TRENDS_DELETE_BEFORE_SLEEP,0,TYPE_INT,PARM_ OPT,0,1000000},
    + {"HistoryAndTrendsSleepBetweenDeletes",&CONFIG_HIS TORY_AND_TRENDS_SLEEP_BETWEEN_DELETES,0,TYPE_INT,P ARM_OPT,0,100},
    {"SenderFrequency",&CONFIG_SENDER_FREQUENCY,0,TYPE _INT,PARM_OPT,5,3600},
    {"TmpDir",&CONFIG_TMPDIR,0,TYPE_STRING,PARM_OPT,0, 0},
    {"FpingLocation",&CONFIG_FPING_LOCATION,0,TYPE_STR ING,PARM_OPT,0,0},

    Apply this patch in the zabbix main directory source. I wrote the patch for zabbix-1.8.2 debian version, as I am using debian/squeece.I haven't tested, if the code applys without warning to the current zabbix version. I am defintively no C guru and assume you guys have a better way to implement the solution. Feel free to change the code, as you like, if you find the idea worthy :-)

    Cheers

    Tom
    Attached Files
    Last edited by Tom Hutter; 09-09-2012, 11:04. Reason: added patch as attachment
  • thijz
    Junior Member
    • Aug 2013
    • 12

    #2
    has this patch been implemented ?

    grtz
    Thijs

    Comment

    • kloczek
      Senior Member
      • Jun 2006
      • 1771

      #3
      Originally posted by thijz
      has this patch been implemented ?

      grtz
      Thijs
      No and probably never will be.
      If you have issue with housekeeping you should start thinking about use partitioning history* and trends* tables instead wasting I/Os on delete old data.
      This is only way to keep this operation under control with growing number monitored items.
      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
      https://kloczek.wordpress.com/
      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
      My zabbix templates https://github.com/kloczek/zabbix-templates

      Comment

      • thijz
        Junior Member
        • Aug 2013
        • 12

        #4
        ok, thnx for the reply

        so keeping the number of history/trend days down is the way to go ?

        grtz
        Thijs

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Originally posted by thijz
          ok, thnx for the reply

          so keeping the number of history/trend days down is the way to go ?

          grtz
          Thijs
          No. Partitioning it is the technique which allows do only few IOs each day on drop oldest daily data by simple delete a file with DB table partition (the same monthly with in trends and no matter how many items you have to maintain) instead of spending much greater number of IOs almost linearly correlated with number of items which you have to monitored in your zabbix.
          Because each day DB starts using empty new partitions and nothing is deleted from these files as result you have not growing size of DB files as long as DB engine stops writing new data to daily partition.
          Remember that number of IOs which is doing DB engine is in linear correlation with DB file size and not with number of records in such files. So writing to DB files new data without delete anything gives you workload which produces lowest possible number of IOs per some number of inserts.
          Zabbix DB it is typical warehouse database. Typical IO characteristics in system which is using partitioning to store new data and dropping oldest data looks like saw when number of inserts/s is const.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • thijz
            Junior Member
            • Aug 2013
            • 12

            #6
            i just had a look at partitioning on https://www.zabbix.org/wiki/Docs/howto/mysql_partition

            looks like i'll be busy for quite a while :-(

            i'm also still on zabbix v2.0.13
            should i upgrade before i partition ?

            thanks for your help, i appreciate it !

            grtz
            Thijs

            Comment

            • kloczek
              Senior Member
              • Jun 2006
              • 1771

              #7
              Originally posted by thijz
              should i upgrade before i partition ?
              Partitioning first
              After partition history* and trends* tables you will have:
              • trends data in smallest possible files
              • after rotating whole history* tables you will have here the same and additionally you will have dropped all other garbage
              • time spend on upgrade DB layout to 22 and to 2.4 will be lowest possible
              • with optimized (by partitioning) history* and trends* DB files you will have more disk space to do all necessary ALTER TABLE queries


              If you have save DB best is do by stop slave -> do backup of the slave (binary not text dump) -> partition tables on slave -> sync all new data from master -> promote slave with partitions as new master.
              Using such scenario no matter how big database you have possible DB downtime is matter of seconds and no matter what wrong will happen during partitioning (something may always go wrong specially if you not been doing this before ) always you master will be not affected and you will be able to repeat partitioning until you will do this right
              If you have enough big DB and no slave and if you must guarantee continuous monitoring as step 0 you should organize slave DB.
              http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
              https://kloczek.wordpress.com/
              zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
              My zabbix templates https://github.com/kloczek/zabbix-templates

              Comment

              Working...