Ad Widget

**Linwood** · 24-11-2017, 16:51

There is a lot behind the question. The first thing to look toward is whether you really have a problem. Housekeeper is generally either 100% or 0%, and having it periodically saturate is not unusual since it's one process (at least as far as I know).

If the disk is truly overloaded over time, your item queues will grow (i.e. it's unable to keep up saving new data). Check that, and while perhaps not quite literally, if it is not significant, you do not have an issue.

Zabbix has a design "feature" in how historic data accumulates and is purged. Used in a straightforward, out of the box way in even modest environments it will not work well over time. Suggestions in no particular order:

1) Read about partitioning and using it as a faster way to purge old data (tons of discussions throughout here and in blogs). This is best for people very good at their database's features and methods. It also imposes some restrictions on data retention (e.g. aligning retention for all items). But this is by far the fastest way to do housekeeping for a given database update date.

2) Only keep actionable data; most templates tend to measure everything there is possible to record, and save it for long periods. A huge percentage of that data is not actionable, or even if it may be, is not useful for long periods, only recent time. Review templates and stop keeping data you do not need.

3) Record data less frequently. I've seen network managers insist on getting alerts for an outage within 10 seconds, but their own time to respond is more like hours. Consider if high pool rate data is actually useful to you, and reduce poll rates.

4) Keep history (vs trends) for less time. Many items you may want to record do not need historical information at the full poll rate, so let it flow to trend data sooner.

5) Do you need trend data at all for some items. Some incident data, e.g. some failure types, are rare and immediately acted upon. Do you need trend data which tends to be 365 days of "service is up" recorded. Consistency triggers may fall into that category, for example I parse AD output to see if there are Windows Servers not in Zabbix -- I really need that alert only once, I do not need history of "no missing servers".

6) Consider using a delete limit in housekeeping of zero. I have not tried this on MySQL but I find it works nicely on postgresql, and lets housekeeping take really large deletes instead of breaking them into pieces. I also run housekeeping every hour. Neither of these two things are recommended by Zabbix so far as I know, but experimenting with frequency and max deletes may find a sweet spot for you (it's never as fast as deleting a partition of course).

I'd wager most people could eliminate 90% of their data without noticing, if they just took templates out of the box and ran with them. That reduces disk load tremendously, and often solves performance issues without more advanced techniques, and also keeps your data to useful data, and when researching stuff you feel less like you are drinking from a fire hose.

**ctp.waffles** · 24-11-2017, 16:53

I was able to solve the problem by upgrading database to mariadb and innodb_buffer_pool_size modification, now works like charm. Thanks Linwood for help!

Ad Widget

Disk I/O is overloaded on zabbix.server.com

Disk I/O is overloaded on zabbix.server.com

Comment

Comment