So I've seen a few posts and threads in my searches that talk about the housekeeper. I've read excerpts from the "mastering zabbix" book that appear to suggest building your own housekeeper and partitioning your history tables.
Unfortunately I have inherited the system and some of the changes are too large to implement at this moment in time.
As can be seen from the logs below our housekeeper runs rather smoothly until it doesn't.
The final entry in this log is what i wish to discuss. The amount of items being deleted is far higher than usual, taking longer and causing data sender alerts on a handful of proxies as the housekeeping process takes FAR too long.
We have these sorts of blips once every few weeks, we have recently upgraded our SQL nodes to 256GB ram (allocating 128GB to the innodb_buffer_pool_size)
8260:20190919:042619.060 executing housekeeper
8260:20190919:043813.359 housekeeper [deleted 7681562 hist/trends, 1459 items, 20394 events, 204 problems, 577 sessions, 0 alarms, 0 audit items in 714.295476 sec, idle for 1 hour(s)]
8260:20190919:053813.965 executing housekeeper
8260:20190919:055009.401 housekeeper [deleted 7623380 hist/trends, 863 items, 20508 events, 389 problems, 572 sessions, 0 alarms, 0 audit items in 715.433000 sec, idle for 1 hour(s)]
8260:20190919:065009.993 executing housekeeper
8260:20190919:070128.881 housekeeper [deleted 7575788 hist/trends, 1726 items, 20706 events, 327 problems, 562 sessions, 0 alarms, 0 audit items in 676.944836 sec, idle for 1 hour(s)]
8260:20190919:080129.483 executing housekeeper
8260:20190919:081352.589 housekeeper [deleted 7640403 hist/trends, 450 items, 20362 events, 241 problems, 631 sessions, 0 alarms, 13 audit items in 743.103124 sec, idle for 1 hour(s)]
8260:20190919:091353.178 executing housekeeper
8260:20190919:092529.328 housekeeper [deleted 7637417 hist/trends, 91465 items, 20613 events, 236 problems, 793 sessions, 0 alarms, 23 audit items in 696.146451 sec, idle for 1 hour(s)]
8260:20190919:102529.936 executing housekeeper
8260:20190919:103643.242 housekeeper [deleted 7546426 hist/trends, 10608 items, 20810 events, 388 problems, 983 sessions, 0 alarms, 67 audit items in 673.302025 sec, idle for 1 hour(s)]
8260:20190919:113643.850 executing housekeeper
8260:20190919:114831.042 housekeeper [deleted 7516240 hist/trends, 6866 items, 21463 events, 335 problems, 1137 sessions, 0 alarms, 10 audit items in 707.188745 sec, idle for 1 hour(s)]
8260:20190919:124831.636 executing housekeeper
8260:20190919:130144.826 housekeeper [deleted 7572087 hist/trends, 1271089 items, 20080 events, 450 problems, 1256 sessions, 0 alarms, 16 audit items in 793.186734 sec, idle for 1 hour(s)]
8260:20190919:140145.433 executing housekeeper
8260:20190919:143158.362 housekeeper [deleted 7825667 hist/trends, 12851685 items, 20598 events, 265 problems, 1012 sessions, 0 alarms, 22 audit items in 1812.925236 sec, idle for 1 hour(s)]
As far as i'm aware - hist/trends is the actual data items being deleted.... so what are items and why is it trying to delete 12 million of them?
Unfortunately I have inherited the system and some of the changes are too large to implement at this moment in time.
As can be seen from the logs below our housekeeper runs rather smoothly until it doesn't.
The final entry in this log is what i wish to discuss. The amount of items being deleted is far higher than usual, taking longer and causing data sender alerts on a handful of proxies as the housekeeping process takes FAR too long.
We have these sorts of blips once every few weeks, we have recently upgraded our SQL nodes to 256GB ram (allocating 128GB to the innodb_buffer_pool_size)
8260:20190919:042619.060 executing housekeeper
8260:20190919:043813.359 housekeeper [deleted 7681562 hist/trends, 1459 items, 20394 events, 204 problems, 577 sessions, 0 alarms, 0 audit items in 714.295476 sec, idle for 1 hour(s)]
8260:20190919:053813.965 executing housekeeper
8260:20190919:055009.401 housekeeper [deleted 7623380 hist/trends, 863 items, 20508 events, 389 problems, 572 sessions, 0 alarms, 0 audit items in 715.433000 sec, idle for 1 hour(s)]
8260:20190919:065009.993 executing housekeeper
8260:20190919:070128.881 housekeeper [deleted 7575788 hist/trends, 1726 items, 20706 events, 327 problems, 562 sessions, 0 alarms, 0 audit items in 676.944836 sec, idle for 1 hour(s)]
8260:20190919:080129.483 executing housekeeper
8260:20190919:081352.589 housekeeper [deleted 7640403 hist/trends, 450 items, 20362 events, 241 problems, 631 sessions, 0 alarms, 13 audit items in 743.103124 sec, idle for 1 hour(s)]
8260:20190919:091353.178 executing housekeeper
8260:20190919:092529.328 housekeeper [deleted 7637417 hist/trends, 91465 items, 20613 events, 236 problems, 793 sessions, 0 alarms, 23 audit items in 696.146451 sec, idle for 1 hour(s)]
8260:20190919:102529.936 executing housekeeper
8260:20190919:103643.242 housekeeper [deleted 7546426 hist/trends, 10608 items, 20810 events, 388 problems, 983 sessions, 0 alarms, 67 audit items in 673.302025 sec, idle for 1 hour(s)]
8260:20190919:113643.850 executing housekeeper
8260:20190919:114831.042 housekeeper [deleted 7516240 hist/trends, 6866 items, 21463 events, 335 problems, 1137 sessions, 0 alarms, 10 audit items in 707.188745 sec, idle for 1 hour(s)]
8260:20190919:124831.636 executing housekeeper
8260:20190919:130144.826 housekeeper [deleted 7572087 hist/trends, 1271089 items, 20080 events, 450 problems, 1256 sessions, 0 alarms, 16 audit items in 793.186734 sec, idle for 1 hour(s)]
8260:20190919:140145.433 executing housekeeper
8260:20190919:143158.362 housekeeper [deleted 7825667 hist/trends, 12851685 items, 20598 events, 265 problems, 1012 sessions, 0 alarms, 22 audit items in 1812.925236 sec, idle for 1 hour(s)]
As far as i'm aware - hist/trends is the actual data items being deleted.... so what are items and why is it trying to delete 12 million of them?
Comment