Ad Widget

Collapse

Disk I/O is overloaded on zabbix.server.com

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ctp.waffles
    Junior Member
    • Nov 2017
    • 2

    #1

    Disk I/O is overloaded on zabbix.server.com

    Hello,
    i do have problem with these error appearing ofter

    Disk I/O is overloaded on zabbix.server.com
    Zabbix housekeeper processes more than 75% busy

    My hardware is truly enough for this, i was able to iotop output that takes up all my io

    mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

    Is there any way to solve this problem?
  • Linwood
    Senior Member
    • Dec 2013
    • 398

    #2
    There is a lot behind the question. The first thing to look toward is whether you really have a problem. Housekeeper is generally either 100% or 0%, and having it periodically saturate is not unusual since it's one process (at least as far as I know).

    If the disk is truly overloaded over time, your item queues will grow (i.e. it's unable to keep up saving new data). Check that, and while perhaps not quite literally, if it is not significant, you do not have an issue.

    Zabbix has a design "feature" in how historic data accumulates and is purged. Used in a straightforward, out of the box way in even modest environments it will not work well over time. Suggestions in no particular order:

    1) Read about partitioning and using it as a faster way to purge old data (tons of discussions throughout here and in blogs). This is best for people very good at their database's features and methods. It also imposes some restrictions on data retention (e.g. aligning retention for all items). But this is by far the fastest way to do housekeeping for a given database update date.

    2) Only keep actionable data; most templates tend to measure everything there is possible to record, and save it for long periods. A huge percentage of that data is not actionable, or even if it may be, is not useful for long periods, only recent time. Review templates and stop keeping data you do not need.

    3) Record data less frequently. I've seen network managers insist on getting alerts for an outage within 10 seconds, but their own time to respond is more like hours. Consider if high pool rate data is actually useful to you, and reduce poll rates.

    4) Keep history (vs trends) for less time. Many items you may want to record do not need historical information at the full poll rate, so let it flow to trend data sooner.

    5) Do you need trend data at all for some items. Some incident data, e.g. some failure types, are rare and immediately acted upon. Do you need trend data which tends to be 365 days of "service is up" recorded. Consistency triggers may fall into that category, for example I parse AD output to see if there are Windows Servers not in Zabbix -- I really need that alert only once, I do not need history of "no missing servers".

    6) Consider using a delete limit in housekeeping of zero. I have not tried this on MySQL but I find it works nicely on postgresql, and lets housekeeping take really large deletes instead of breaking them into pieces. I also run housekeeping every hour. Neither of these two things are recommended by Zabbix so far as I know, but experimenting with frequency and max deletes may find a sweet spot for you (it's never as fast as deleting a partition of course).

    I'd wager most people could eliminate 90% of their data without noticing, if they just took templates out of the box and ran with them. That reduces disk load tremendously, and often solves performance issues without more advanced techniques, and also keeps your data to useful data, and when researching stuff you feel less like you are drinking from a fire hose.

    Comment

    • ctp.waffles
      Junior Member
      • Nov 2017
      • 2

      #3
      I was able to solve the problem by upgrading database to mariadb and innodb_buffer_pool_size modification, now works like charm. Thanks Linwood for help!

      Comment

      Working...