Ad Widget

Collapse

Utilization of configuration syncer processes over 75% - Upgrade from 6.0.0 to 6.0.1

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Robert N.
    Junior Member
    • Feb 2022
    • 28

    #1

    Utilization of configuration syncer processes over 75% - Upgrade from 6.0.0 to 6.0.1

    Hello,

    I upgraded my environment from 6.0.0 to 6.0.1 and have this message : Utilization of configuration syncer processes over 75%. Before the upgrade was CPU utilization 0,6% and now is 34%. Im using 3 Proxys.
    Nothing bad in the logs on all servers.

    Proxy config :

    StartPollers=100
    StartPollersUnreachable=50
    StartPingers=50
    StartTrappers=10
    StartDiscoverers=15
    StartHTTPPollers=5
    CacheSize=128M
    HistoryCacheSize=64M
    HistoryIndexCacheSize=32M



    Click image for larger version

Name:	cpu.png
Views:	10135
Size:	48.4 KB
ID:	440715

    The Zabbix Server configuration is :


    StartPollers=100
    StartPollersUnreachable=50
    StartPingers=50
    StartTrappers=1
    StartDiscoverers=15
    StartPreprocessors=15
    StartHTTPPollers=5
    StartAlerters=5
    StartTimers=2
    StartEscalators=2
    CacheSize=128M
    HistoryCacheSize=64M
    HistoryIndexCacheSize=32M
    TrendCacheSize=32M
    ValueCacheSize=256M

    MariaDB 10.6.5 Config :

    [mysqld]
    max_connections = 404
    innodb_buffer_pool_size = 2G

    innodb-log-file-size = 128M
    innodb-log-buffer-size = 128M
    innodb-file-per-table = 1
    innodb_buffer_pool_instances = 8
    innodb_old_blocks_time = 1000
    innodb_stats_on_metadata = off
    innodb-flush-method = O_DIRECT
    innodb-log-files-in-group = 2
    innodb-flush-log-at-trx-commit = 2

    tmp-table-size = 96M
    max-heap-table-size = 96M
    open_files_limit = 65535
    max_connect_errors = 1000000
    connect_timeout = 60
    wait_timeout = 28800


    Could you please help ?

    THX

  • Robert N.
    Junior Member
    • Feb 2022
    • 28

    #2
    I found this in the log :

    slow query: 269.173074 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)"

    Comment

    • Robert N.
      Junior Member
      • Feb 2022
      • 28

      #3
      I tried clean installation in VirtualBox Zabix 6.0.0 and just update to 6.0.1 with the same result. CPU is high.

      LOG:

      slow query: 6.562371 sec, "select t.triggerid,t.description,t.expression,t.status,t. type,t.priority,t.comments,t.url,t.recovery_expres sion,t.recovery_mode,t.correlation_mode,t.correlat ion_tag,t.manual_close,t.opdata,t.discover,t.event _name from triggers t where t.triggerid in (select distinct tg.triggerid from triggers tg,functions f,items i,item_discovery id where tg.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=39807)"
      959:20220302:120651.271 slow query: 6.847315 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)"
      1149:20220302:120731.278 slow query: 6.004122 sec, "select t.triggerid,t.description,t.expression,t.status,t. type,t.priority,t.comments,t.url,t.recovery_expres sion,t.recovery_mode,t.correlation_mode,t.correlat ion_tag,t.manual_close,t.opdata,t.discover,t.event _name from triggers t where t.triggerid in (select distinct tg.triggerid from triggers tg,functions f,items i,item_discovery id where tg.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=39807)"
      959:20220302:120758.117 slow query: 6.466183 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)"
      1148:20220302:120831.388 slow query: 6.045933 sec, "select t.triggerid,t.description,t.expression,t.status,t. type,t.priority,t.comments,t.url,t.recovery_expres sion,t.recovery_mode,t.correlation_mode,t.correlat ion_tag,t.manual_close,t.opdata,t.discover,t.event _name from triggers t where t.triggerid in (select distinct tg.triggerid from triggers tg,functions f,items i,item_discovery id where tg.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=39807)"
      959:20220302:120905.547 slow query: 7.064507 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)"
      1149:20220302:120931.865 slow query: 6.443288 sec, "select t.triggerid,t.description,t.expression,t.status,t. type,t.priority,t.comments,t.url,t.recovery_expres sion,t.recovery_mode,t.correlation_mode,t.correlat ion_tag,t.manual_close,t.opdata,t.discover,t.event _name from triggers t where t.triggerid in (select distinct tg.triggerid from triggers tg,functions f,items i,item_discovery id where tg.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=39807)"
      959:20220302:121013.227 slow query: 7.342888 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)"
      1148:20220302:121031.473 slow query: 5.992450 sec, "select t.triggerid,t.description,t.expression,t.status,t. type,t.priority,t.comments,t.url,t.recovery_expres sion,t.recovery_mode,t.correlation_mode,t.correlat ion_tag,t.manual_close,t.opdata,t.discover,t.event _name from triggers t where t.triggerid in (select distinct tg.triggerid from triggers tg,functions f,items i,item_discovery id where tg.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=39807)"
      959:20220302:121120.439 slow query: 6.837490 sec, "select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t,hosts h,items i,functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1)"
      1149:20220302:121131.676 slow query: 6.128351 sec, "select t.triggerid,t.description,t.expression,t.status,t. type,t.priority,t.comments,t.url,t.recovery_expres sion,t.recovery_mode,t.correlation_mode,t.correlat ion_tag,t.manual_close,t.opdata,t.discover,t.event _name from triggers t where t.triggerid in (select distinct tg.triggerid from triggers tg,functions f,items i,item_discovery id where tg.triggerid=f.triggerid and f.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=39807)"

      Comment

      • Robert N.
        Junior Member
        • Feb 2022
        • 28

        #4
        Zabbix server do not collect any data. This query killed my Zabbix.

        Click image for larger version

Name:	query.png
Views:	10121
Size:	6.5 KB
ID:	440797

        Comment

        • Robert N.
          Junior Member
          • Feb 2022
          • 28

          #5
          Solution : I killed this query in DB and we are back on track ;-)

          mysql -uroot -pzabbix -e "show processlist"
          mysql -uroot -pzabbix -e "kill 450" In my case the slow query ID was 450. It was just "select" query, so the DB do not crash.



          Click image for larger version  Name:	cpu2.png Views:	0 Size:	36.0 KB ID:	440807
          Last edited by Robert N.; 03-03-2022, 11:41.

          Comment

          • mcodo
            Junior Member
            • Mar 2022
            • 4

            #6
            Anyone had any luck fixing this?
            I'm trying to kill the select process, but it just comes back after a few seconds, and the CPU goes back to 100%

            Comment

            • naboo
              Junior Member
              • Apr 2022
              • 1

              #7
              I have the same problem. ZABBIX version 6.0.3 and MariaDB version 10.7.3 have about 30 hosts and 1200 + triggers. I deleted all hosts and returned to normal.
              Click image for larger version

Name:	20220427153847.png
Views:	9879
Size:	83.8 KB
ID:	443813

              Comment


              • tim.mooney
                tim.mooney commented
                Editing a comment
                MariaDB 10.7 is not yet listed as supported. If you have an option for something in the 10.6 or 10.5 series, that may be worth trying.

                Is this a new install, or an upgrade from a previous version?
            • stevefxp
              Senior Member
              • Aug 2020
              • 168

              #8
              I am having the same issue and this is a clean install with nothing added to it. I am running MariaDB 10.6.7.

              Comment

              • mcodo
                Junior Member
                • Mar 2022
                • 4

                #9
                The SELECT that is causing the wait and CPU load seems to be:
                select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t, hosts h,items i, functions f where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1);

                I think this is a bad way to do a query, instead of using JOIN between the tables. But there might be a reason for this?

                Anyway, If I change the order of the tables in the query, it goes alot faster.
                Like this: (just move functions f)
                select distinct d.triggerid_down,d.triggerid_up from trigger_depends d,triggers t, functions f, hosts h,items i where t.triggerid=d.triggerid_down and t.flags<>2 and h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=d.triggerid_down and h.status in (0,1);

                But... I dont know how I can do this in the system, without getting it from an updated version.

                Anyone?

                Comment

                • vso
                  Zabbix developer
                  • Aug 2016
                  • 190

                  #10
                  Thank you for your report, can you confirm that changing order in query helps ? I was not able to reproduce issue locally, maybe you can share your my.cnf please and version of MariaDB, does issue occur with MySQL ? Please also see https://support.zabbix.com/browse/ZBX-20936
                  Last edited by vso; 11-05-2022, 16:33.

                  Comment

                  • mcodo
                    Junior Member
                    • Mar 2022
                    • 4

                    #11
                    Hi vso
                    Thank you for looking into this :-)

                    We are running on MariaDB ver. 10.6.7
                    Server version: 10.6.7-MariaDB-1:10.6.7+maria~focal mariadb.org binary distribution

                    You will find our .cnf file attached.

                    One strange thing is that we have two identical installations of Zabbix server, running on identical VMs om the same hardware.
                    Both servers are installed from scratch with Zabbix 6.0.2. (No upgrade)
                    One is running fine, and the other is having this issue with 100% CPU load based on the select mentiond above.
                    The select never finish running.
                    I can kill the select, but it just starts up againg, and thats probably correct.

                    When I run the select manually in mysql shell, it behaves the same way.
                    But when I change the table order in the select as i mention on above post, it finishes in about a second on the problem server.

                    This was always working fine when we used MySql server.
                    Attached Files
                    Last edited by mcodo; 11-05-2022, 19:52.

                    Comment

                    • vso
                      Zabbix developer
                      • Aug 2016
                      • 190

                      #12
                      Unfortunately no luck in reproducing the issue, could you please try if patch from https://support.zabbix.com/browse/ZBX-20936 solves the issue ?

                      Comment


                      • mcodo
                        mcodo commented
                        Editing a comment
                        Hi, thanks
                        I am a little unsure how to run this patch :-o
                        Can you give me a command or guide on how to do it?
                    • vso
                      Zabbix developer
                      • Aug 2016
                      • 190

                      #13
                      To patch:
                      patch -p1 -i ZBX-20936-test.diff

                      To compile:
                      https://www.zabbix.com/documentation...lation/install
                      Alternatively you can check following query by executing it manually:
                      Code:
                      select triggerid_down,triggerid_up from trigger_depends;

                      Comment

                      • mcodo
                        Junior Member
                        • Mar 2022
                        • 4

                        #14
                        vso , Sorry for the late reply here.
                        We are running with apt install, so we can't compile a new instance on this one.
                        But,, we have found out that if we change the optimizer_prune_level to 0, then it works well with that problem.
                        Only issue we can see is that the hosts-view becomes a bit slower.

                        Code:
                        set optimizer_prune_level=0;
                        I'm not sure if this has any other effect on the system, but we will try this for a while, until there is an updated binary with a fix.

                        Thanx :-)

                        Comment

                        • mcflurry
                          Member
                          • Jun 2022
                          • 32

                          #15
                          Same issue here on 6.0.9 with just 111 hosts, i'm increasing caches and pollers in config file...

                          Comment

                          Working...