Hi everybody,
I recently switched my database from MySQL to PostgreSQL and suffered of heavy load during housekeeper runs since then. Seems MySQL is much better in deleting rows than PostgreSQL.
During every housekeeper run I experienced a heavy load peek, manly generated by disk IO and my monitoring (zabbix - don't know if you guys know this ;-) ) went red. I have to mention, that my zabbix server is running in a VM and the server disks surely are not the fastest. I looked up the code of houskeeper and found out that houskeeper does a lot of deletes in every run, especially in "housekeeping_history_and_trends". I decided to give the server some rest after deleting a bunch of rows, before deleting the next. By this I can kind of flatten the load during housekeeper runs. The housekeeper runs are taking longer and the load peek is less high.
There are two new configurable variables which you can add to "/etc/zabbix/zabbix_server.conf":
# threshold, how many deletes before houskeeper sleeps
HistoryAndTrendsDeleteBeforeSleep=1000
# seconds of sleep between deletes
HistoryAndTrendsSleepBetweenDeletes=10
Default they both are set to 0, which disables the sleeps and restores the currently existing behaviour.
During housekeeper runs you then see lines like this in zabbix_server.log (log level WARNING (3)):
deleted 1002 > 1000 entries, housekeeper is sleeping 10 sec
With sleeping 10 seconds after more than 1000 deletes I can keep my server load below 1. With sleeping 10 seconds after more than 3000 deletes, the load is about 1.5. I haven't tested more numbers, as I am fine with 1000/10.
I don't have to take care, that a housekeeper run will take so long that another would be started and they collide. Housekeeper are running in an endless for loop and wait afterwards the time configured with CONFIG_HOUSEKEEPING_FREQUENCY.
And finally the patch:
diff --git a/src/zabbix_server/housekeeper/housekeeper.c src/zabbix_server/housekeeper/housekeeper.c
index e021eaa..aade82a 100644
--- a/src/zabbix_server/housekeeper/housekeeper.c
+++ src/zabbix_server/housekeeper/housekeeper.c
@@ -319,7 +319,9 @@ static int housekeeping_history_and_trends(int now)
DB_RESULT result;
DB_ROW row;
- int deleted = 0;
+ long deleted = 0;
+ long count = 0;
+ long deleted_since_sleep = 0;
zabbix_log( LOG_LEVEL_DEBUG, "In housekeeping_history_and_trends(%d)",
now);
@@ -332,13 +334,26 @@ static int housekeeping_history_and_trends(int now)
item.history=atoi(row[1]);
item.trends=atoi(row[2]);
- deleted += delete_history("history", item.itemid, item.history, now);
- deleted += delete_history("history_uint", item.itemid, item.history, now);
- deleted += delete_history("history_str", item.itemid, item.history, now);
- deleted += delete_history("history_text", item.itemid, item.history, now);
- deleted += delete_history("history_log", item.itemid, item.history, now);
- deleted += delete_history("trends", item.itemid, item.trends, now);
- deleted += delete_history("trends_uint", item.itemid, item.trends, now);
+ count = delete_history("history", item.itemid, item.history, now);
+ count += delete_history("history_uint", item.itemid, item.history, now);
+ count += delete_history("history_str", item.itemid, item.history, now);
+ count += delete_history("history_text", item.itemid, item.history, now);
+ count += delete_history("history_log", item.itemid, item.history, now);
+ count += delete_history("trends", item.itemid, item.trends, now);
+ count += delete_history("trends_uint", item.itemid, item.trends, now);
+ deleted += count;
+ deleted_since_sleep += count;
+ if( CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP > 0 &&
+ deleted_since_sleep >= CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP )
+ {
+ zabbix_log( LOG_LEVEL_WARNING, "deleted %d > %d entries, housekeeper is sleeping %d sec",
+ deleted_since_sleep,
+ CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP,
+ CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES
+ );
+ sleep(CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELE TES);
+ deleted_since_sleep = 0;
+ }
}
DBfree_result(result);
return deleted;
diff --git a/src/zabbix_server/housekeeper/housekeeper.h src/zabbix_server/housekeeper/housekeeper.h
index 64f3be8..9523a5e 100644
--- a/src/zabbix_server/housekeeper/housekeeper.h
+++ src/zabbix_server/housekeeper/housekeeper.h
@@ -23,6 +23,8 @@
extern int CONFIG_DISABLE_HOUSEKEEPING;
extern int CONFIG_HOUSEKEEPING_FREQUENCY;
extern int CONFIG_MAX_HOUSEKEEPER_DELETE;
+extern int CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP;
+extern int CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES;
int main_housekeeper_loop();
diff --git a/src/zabbix_server/server.c src/zabbix_server/server.c
index ea8baec..b4b6c2a 100644
--- a/src/zabbix_server/server.c
+++ src/zabbix_server/server.c
@@ -130,6 +130,8 @@ int CONFIG_TRAPPER_TIMEOUT = ZABBIX_TRAPPER_TIMEOUT;
/*int CONFIG_NOTIMEWAIT =0;*/
int CONFIG_HOUSEKEEPING_FREQUENCY = 1;
int CONFIG_MAX_HOUSEKEEPER_DELETE = 500; /* applies for every separate field value */
+int CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP = 0;
+int CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES = 0;
int CONFIG_SENDER_FREQUENCY = 30;
int CONFIG_DBSYNCER_FORKS = 1;
int CONFIG_DBSYNCER_FREQUENCY = 5;
@@ -217,6 +219,8 @@ void init_config(void)
{"CacheUpdateFrequency",&CONFIG_DBCONFIG_FREQUENCY ,0,TYPE_INT,PARM_OPT,1,3600},
{"HousekeepingFrequency",&CONFIG_HOUSEKEEPING_FREQ UENCY,0,TYPE_INT,PARM_OPT,1,24},
{"MaxHousekeeperDelete",&CONFIG_MAX_HOUSEKEEPER_DE LETE,0,TYPE_INT,PARM_OPT,0,1000000},
+ {"HistoryAndTrendsDeleteBeforeSleep",&CONFIG_HISTO RY_AND_TRENDS_DELETE_BEFORE_SLEEP,0,TYPE_INT,PARM_ OPT,0,1000000},
+ {"HistoryAndTrendsSleepBetweenDeletes",&CONFIG_HIS TORY_AND_TRENDS_SLEEP_BETWEEN_DELETES,0,TYPE_INT,P ARM_OPT,0,100},
{"SenderFrequency",&CONFIG_SENDER_FREQUENCY,0,TYPE _INT,PARM_OPT,5,3600},
{"TmpDir",&CONFIG_TMPDIR,0,TYPE_STRING,PARM_OPT,0, 0},
{"FpingLocation",&CONFIG_FPING_LOCATION,0,TYPE_STR ING,PARM_OPT,0,0},
Apply this patch in the zabbix main directory source. I wrote the patch for zabbix-1.8.2 debian version, as I am using debian/squeece.I haven't tested, if the code applys without warning to the current zabbix version. I am defintively no C guru and assume you guys have a better way to implement the solution. Feel free to change the code, as you like, if you find the idea worthy :-)
Cheers
Tom
I recently switched my database from MySQL to PostgreSQL and suffered of heavy load during housekeeper runs since then. Seems MySQL is much better in deleting rows than PostgreSQL.
During every housekeeper run I experienced a heavy load peek, manly generated by disk IO and my monitoring (zabbix - don't know if you guys know this ;-) ) went red. I have to mention, that my zabbix server is running in a VM and the server disks surely are not the fastest. I looked up the code of houskeeper and found out that houskeeper does a lot of deletes in every run, especially in "housekeeping_history_and_trends". I decided to give the server some rest after deleting a bunch of rows, before deleting the next. By this I can kind of flatten the load during housekeeper runs. The housekeeper runs are taking longer and the load peek is less high.
There are two new configurable variables which you can add to "/etc/zabbix/zabbix_server.conf":
# threshold, how many deletes before houskeeper sleeps
HistoryAndTrendsDeleteBeforeSleep=1000
# seconds of sleep between deletes
HistoryAndTrendsSleepBetweenDeletes=10
Default they both are set to 0, which disables the sleeps and restores the currently existing behaviour.
During housekeeper runs you then see lines like this in zabbix_server.log (log level WARNING (3)):
deleted 1002 > 1000 entries, housekeeper is sleeping 10 sec
With sleeping 10 seconds after more than 1000 deletes I can keep my server load below 1. With sleeping 10 seconds after more than 3000 deletes, the load is about 1.5. I haven't tested more numbers, as I am fine with 1000/10.
I don't have to take care, that a housekeeper run will take so long that another would be started and they collide. Housekeeper are running in an endless for loop and wait afterwards the time configured with CONFIG_HOUSEKEEPING_FREQUENCY.
And finally the patch:
diff --git a/src/zabbix_server/housekeeper/housekeeper.c src/zabbix_server/housekeeper/housekeeper.c
index e021eaa..aade82a 100644
--- a/src/zabbix_server/housekeeper/housekeeper.c
+++ src/zabbix_server/housekeeper/housekeeper.c
@@ -319,7 +319,9 @@ static int housekeeping_history_and_trends(int now)
DB_RESULT result;
DB_ROW row;
- int deleted = 0;
+ long deleted = 0;
+ long count = 0;
+ long deleted_since_sleep = 0;
zabbix_log( LOG_LEVEL_DEBUG, "In housekeeping_history_and_trends(%d)",
now);
@@ -332,13 +334,26 @@ static int housekeeping_history_and_trends(int now)
item.history=atoi(row[1]);
item.trends=atoi(row[2]);
- deleted += delete_history("history", item.itemid, item.history, now);
- deleted += delete_history("history_uint", item.itemid, item.history, now);
- deleted += delete_history("history_str", item.itemid, item.history, now);
- deleted += delete_history("history_text", item.itemid, item.history, now);
- deleted += delete_history("history_log", item.itemid, item.history, now);
- deleted += delete_history("trends", item.itemid, item.trends, now);
- deleted += delete_history("trends_uint", item.itemid, item.trends, now);
+ count = delete_history("history", item.itemid, item.history, now);
+ count += delete_history("history_uint", item.itemid, item.history, now);
+ count += delete_history("history_str", item.itemid, item.history, now);
+ count += delete_history("history_text", item.itemid, item.history, now);
+ count += delete_history("history_log", item.itemid, item.history, now);
+ count += delete_history("trends", item.itemid, item.trends, now);
+ count += delete_history("trends_uint", item.itemid, item.trends, now);
+ deleted += count;
+ deleted_since_sleep += count;
+ if( CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP > 0 &&
+ deleted_since_sleep >= CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP )
+ {
+ zabbix_log( LOG_LEVEL_WARNING, "deleted %d > %d entries, housekeeper is sleeping %d sec",
+ deleted_since_sleep,
+ CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP,
+ CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES
+ );
+ sleep(CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELE TES);
+ deleted_since_sleep = 0;
+ }
}
DBfree_result(result);
return deleted;
diff --git a/src/zabbix_server/housekeeper/housekeeper.h src/zabbix_server/housekeeper/housekeeper.h
index 64f3be8..9523a5e 100644
--- a/src/zabbix_server/housekeeper/housekeeper.h
+++ src/zabbix_server/housekeeper/housekeeper.h
@@ -23,6 +23,8 @@
extern int CONFIG_DISABLE_HOUSEKEEPING;
extern int CONFIG_HOUSEKEEPING_FREQUENCY;
extern int CONFIG_MAX_HOUSEKEEPER_DELETE;
+extern int CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP;
+extern int CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES;
int main_housekeeper_loop();
diff --git a/src/zabbix_server/server.c src/zabbix_server/server.c
index ea8baec..b4b6c2a 100644
--- a/src/zabbix_server/server.c
+++ src/zabbix_server/server.c
@@ -130,6 +130,8 @@ int CONFIG_TRAPPER_TIMEOUT = ZABBIX_TRAPPER_TIMEOUT;
/*int CONFIG_NOTIMEWAIT =0;*/
int CONFIG_HOUSEKEEPING_FREQUENCY = 1;
int CONFIG_MAX_HOUSEKEEPER_DELETE = 500; /* applies for every separate field value */
+int CONFIG_HISTORY_AND_TRENDS_DELETE_BEFORE_SLEEP = 0;
+int CONFIG_HISTORY_AND_TRENDS_SLEEP_BETWEEN_DELETES = 0;
int CONFIG_SENDER_FREQUENCY = 30;
int CONFIG_DBSYNCER_FORKS = 1;
int CONFIG_DBSYNCER_FREQUENCY = 5;
@@ -217,6 +219,8 @@ void init_config(void)
{"CacheUpdateFrequency",&CONFIG_DBCONFIG_FREQUENCY ,0,TYPE_INT,PARM_OPT,1,3600},
{"HousekeepingFrequency",&CONFIG_HOUSEKEEPING_FREQ UENCY,0,TYPE_INT,PARM_OPT,1,24},
{"MaxHousekeeperDelete",&CONFIG_MAX_HOUSEKEEPER_DE LETE,0,TYPE_INT,PARM_OPT,0,1000000},
+ {"HistoryAndTrendsDeleteBeforeSleep",&CONFIG_HISTO RY_AND_TRENDS_DELETE_BEFORE_SLEEP,0,TYPE_INT,PARM_ OPT,0,1000000},
+ {"HistoryAndTrendsSleepBetweenDeletes",&CONFIG_HIS TORY_AND_TRENDS_SLEEP_BETWEEN_DELETES,0,TYPE_INT,P ARM_OPT,0,100},
{"SenderFrequency",&CONFIG_SENDER_FREQUENCY,0,TYPE _INT,PARM_OPT,5,3600},
{"TmpDir",&CONFIG_TMPDIR,0,TYPE_STRING,PARM_OPT,0, 0},
{"FpingLocation",&CONFIG_FPING_LOCATION,0,TYPE_STR ING,PARM_OPT,0,0},
Apply this patch in the zabbix main directory source. I wrote the patch for zabbix-1.8.2 debian version, as I am using debian/squeece.I haven't tested, if the code applys without warning to the current zabbix version. I am defintively no C guru and assume you guys have a better way to implement the solution. Feel free to change the code, as you like, if you find the idea worthy :-)
Cheers
Tom

Comment