Ad Widget

Collapse

DataBase - 100% cpu Utilization when searching Problems

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ZabbixUser-151
    Junior Member
    • Nov 2022
    • 3

    #1

    DataBase - 100% cpu Utilization when searching Problems

    Zabbix server: 5.4.6
    Mariadb: 10.5.11

    Hi all and thank you for your time in advance,


    We are facing the following case, when a user tries to search problems from the UI, Monitoring->Problems->Problems without filtering the timeframe from “Age less than”, the cpu utilization of the database is peaked.
    In order for this behavior to stop we need to restart the php-fpm service.

    From our investigation, we noticed that from the above task, the application searches on the table "problem", which has almost 850.000 entries.
    These problems are unresolved but frozen, because the corresponding trigger is disabled.
    Also, we have similar unresolved cases for the table “events”, when we search through Monitoring->Problems->History.
    We noticed that, when a problem is recovered, the following columns are filled in the problem table: r_eventid, r_clock, r_ns and userid.
    I would like to know which is the best way to safely delete or clean the database from these entries, to delete these entries from the problem table or to update the above columns on the pending problems.
    Another case is to enable the trigger back and manual close the open problems, but we do not want to do that because the system is overload.

    Thanks again for your time

  • ZabbixUser-151
    Junior Member
    • Nov 2022
    • 3

    #2
    *******Update

    Instead of hard deleting the rows directly on DB from the table problem, i updated the rows periodically based on time by using the column clock like below:

    update problem
    set r_eventid = XXXXXXX, r_clock = XXXXXXXXX, r_ns = XXXXXXXX, userid = X
    where clock < XXXXXXXXX and (name like "%XXXXXXX%" or name like "%XXXXXXXXX%");

    After each update, i manually triggered the housekeeper by running the below:

    zabbix_server -c /path/to/confinguration/zabbix_server.conf -R housekeeper_execute

    Until now, no issues noticed.

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4807

      #3
      That same thing seems to affect "Problem" widget, when "history" view is used. There is no option to set time restrictions and if your host and hostgroup selection has a lot of problems in the past... your DB query will run a long time... Add to that multiple users using same dashboard with that widget, and you get a disaster in DB performance..

      Comment

      • ZabbixUser-151
        Junior Member
        • Nov 2022
        • 3

        #4
        Hi Cyber,

        From my investigation, i noticed that on Problems when you select History View, the system looks into table Events and not into Problem. i do know if the widget works the same way.

        Before the cleanup, when i was trying to view the results by selecting Monitoring->Problems->History(without filter), i didn't have a problem.
        But yes, in the scenario you described above i believe also that the system will be overloaded while the number of users looking in that graph and/or the size of the table is increasing.

        What i noticed in my case was that when i chose Monitoring->Problems->Problems(without filter), the first select

        SELECT p.eventid,p.objectid,p.clock,p.ns,p.name,p.severit y FROM problem p WHERE p.source='0' AND p.object='0' AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=p.eventid) AND p.r_eventid IS NULL ORDER BY p.eventid DESC LIMIT 1001;​

        was heavy but it was served, but the system afterwards performed similar, like the below queries

        SELECT p.eventid,p.objectid,p.clock,p.ns,p.name,p.severit y FROM problem p WHERE p.source='0' AND p.object='0' AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=p.eventid) AND p.r_eventid IS NULL AND p.eventid<='9674956' ORDER BY p.eventid DESC LIMIT 1001
        SELECT p.eventid,p.objectid,p.clock,p.ns,p.name,p.severit y FROM problem p WHERE p.source='0' AND p.object='0' AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=p.eventid) AND p.r_eventid IS NULL AND p.eventid<='9779399' ORDER BY p.eventid DESC LIMIT 1001
        SELECT p.eventid,p.objectid,p.clock,p.ns,p.name,p.severit y FROM problem p WHERE p.source='0' AND p.object='0' AND NOT EXISTS (SELECT NULL FROM event_suppress es WHERE es.eventid=p.eventid) AND p.r_eventid IS NULL AND p.eventid<='9892131' ORDER BY p.eventid DESC LIMIT 1001

        which was taking for ever.

        I do not know, if regardless the huge amount of rows, in my case there was also hanged problem rows in the Database.

        Comment

        Working...