I'm running zabbix 1.6.4 with postgresql 8.3 and we have a lot of items, but we also have a dedicated 4-way DB server with lots of RAM.
Starting a week or so ago, the load on the DB server (which is separate from the zabbix server) went from an average of 3 to an average of 20. I enabled logging of slow queries on postgres and saw a bunch of thing like:
delete from history where oid in (select oid from history where itemid=36638 limit 500)
And these requests were taking 500s to execute. I guessed that these were from the housekeeper, so I disabled the housekeeper and sure enough the load on the server went *way* down, to something more like 2.
I looked at the housekeeper table, and none of the itemid's that it is deleting from the history table exist in the items table. This led me to suspect that the housekeeper was cleaning out the history of a bunch of deleted items (since they no longer had entries in the items table).
I wrote a script to purge the history* tables of all the itemids in the housekeeper table and then purged the housekeeper table and restarted. This allowed me to get the system deletes over with as quickly as possible. Afterwards, the system ran great for about 24 hours, and then the load shot back up into the 20's.
What is surprising is that there is now 345 rows in the housekeeper table (59 distinct values that are all itemids). Once again, none of these itemids exist in the items table, or any other table I can find. And we haven't deleted 59 items in the last few days. Oh, wait... as I was typing this I was just informed that one of the other admins unlinked/linked some templates. Would this cause zabbix to purge the entire history of an item?
One related question - it sounds like the housekeeper issues some vacuum commands itself, is that right? If so, since I'm running pgsql 8.3 and I have autovacuum and autoanalyze enabled, should I disable them in favor of the housekeeper? Or can I disable this behavior in the housekeeper?
Thanks!
-poul
Starting a week or so ago, the load on the DB server (which is separate from the zabbix server) went from an average of 3 to an average of 20. I enabled logging of slow queries on postgres and saw a bunch of thing like:
delete from history where oid in (select oid from history where itemid=36638 limit 500)
And these requests were taking 500s to execute. I guessed that these were from the housekeeper, so I disabled the housekeeper and sure enough the load on the server went *way* down, to something more like 2.
I looked at the housekeeper table, and none of the itemid's that it is deleting from the history table exist in the items table. This led me to suspect that the housekeeper was cleaning out the history of a bunch of deleted items (since they no longer had entries in the items table).
I wrote a script to purge the history* tables of all the itemids in the housekeeper table and then purged the housekeeper table and restarted. This allowed me to get the system deletes over with as quickly as possible. Afterwards, the system ran great for about 24 hours, and then the load shot back up into the 20's.
What is surprising is that there is now 345 rows in the housekeeper table (59 distinct values that are all itemids). Once again, none of these itemids exist in the items table, or any other table I can find. And we haven't deleted 59 items in the last few days. Oh, wait... as I was typing this I was just informed that one of the other admins unlinked/linked some templates. Would this cause zabbix to purge the entire history of an item?
One related question - it sounds like the housekeeper issues some vacuum commands itself, is that right? If so, since I'm running pgsql 8.3 and I have autovacuum and autoanalyze enabled, should I disable them in favor of the housekeeper? Or can I disable this behavior in the housekeeper?
Thanks!
-poul
Comment