Ad Widget

Collapse

Zabbix not write data to DB postgresql, grow queue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • user.zabbix
    Junior Member
    • Feb 2020
    • 25

    #16
    bulk processing

    Comment

    • Hamardaban
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • May 2019
      • 2713

      #17
      That's what I meant. Whether this attribute is set for SNMP interfaces.

      For diagnostics, try increasing the logging level of the running server
      Code:
      zabbix_server -R log_level_increase=poller
      And after a while, check the log for any errors or problems.

      From the part of the z_server and psql config, it follows that the ip stack and the localhost address are used to interact with the database... Maybe it makes sense to switch to interaction via socket?

      PS I will never believe that there are no errors in the log with such a queue! I'm sure there are reports of nodes being unavailable.
      And by the way! Based on what information, the conclusion is made about the insufficient performance of the database?
      Last edited by Hamardaban; 25-02-2020, 14:28.

      Comment

      • user.zabbix
        Junior Member
        • Feb 2020
        • 25

        #18

        no terrible error
        Last edited by user.zabbix; 25-02-2020, 17:23.

        Comment

        • user.zabbix
          Junior Member
          • Feb 2020
          • 25

          #19
          LOG FILE: https://drive.google.com/open?id=1nh...hhH8CRd-bsIl81

          Last edited by user.zabbix; 25-02-2020, 21:01.

          Comment

          • user.zabbix
            Junior Member
            • Feb 2020
            • 25

            #20
            I run
            /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf -R log_level_decrease="unreachable poller"
            /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf -R log_level_decrease="poller"
            /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf -R log_level_decrease="discoverer"

            I have a highbusy on "unreachable poller","poller","discoverer" but no CPU load

            LOG FILE: 200Mb https://drive.google.com/open?id=1QS...C57p7kORAa7vEO

            please help me find a Bottlenecks
            because i not see :-(
            Last edited by user.zabbix; 25-02-2020, 22:23.

            Comment

            • Hamardaban
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • May 2019
              • 2713

              #21
              If TimescaleDB is used - disable housekeeping!
              Indeed, there are slow queries - this is not correct.
              And the worst part is the messages " unreachable poller #XX [got Y values in Z sec...". That's what needs to be dealt with... I advise you to disable debagging and enable it only for one poller. Then look at his mistakes more deeply

              UP
              don’t disable HK - its used tmdb nice!
              Last edited by Hamardaban; 30-11-2020, 07:43.

              Comment

              • Hamardaban
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • May 2019
                • 2713

                #22
                I tried to execute a query in my system that is executed slowly in your system, but I could not because the table schema is different! You had a message about updating the system to 4.4.5 - and now what version is it?
                And "The Global Recommendation" is to upgrade to the current branch - 4.4 ....
                Last edited by Hamardaban; 26-02-2020, 11:32.

                Comment

                • user.zabbix
                  Junior Member
                  • Feb 2020
                  • 25

                  #23
                  our slow query described in BUG

                  in our case this query return 683679 row
                  Last edited by user.zabbix; 26-02-2020, 15:15.

                  Comment

                  • user.zabbix
                    Junior Member
                    • Feb 2020
                    • 25

                    #24
                    "You had a message about updating the system to 4.4.5 - and now what version is it?" alredy upgrated to 4.4.6 :-(

                    zabbix=> explain analyze
                    zabbix-> select i.itemid,i.hostid,i.status,i.type,i.value_type,i.k ey_,i.snmp_community,i.snmp_oid,i.port,i.snmpv3_se curityname,i.snmpv3_securitylevel,i.snmpv3_authpas sphrase,i.snmpv3_privpassphrase,i.ipmi_sensor,i.de lay,i.trapper_hosts,i.logtimefmt,i.params,ir.state ,i.authtype,i.username,i.password,i.publickey,i.pr ivatekey,i.flags,i.interfaceid,i.snmpv3_authprotoc ol,i.snmpv3_privprotocol,i.snmpv3_contextname,ir.l astlogsize,ir.mtime,i.history,i.trends,i.inventory _link,i.valuemapid,i.units,ir.error,i.jmx_endpoint ,i.master_itemid,i.timeout,i.url,i.query_fields,i. posts,i.status_codes,i.follow_redirects,i.post_typ e,i.http_proxy,i.headers,i.retrieve_mode,i.request _method,i.output_format,i.ssl_cert_file,i.ssl_key_ file,i.ssl_key_password,i.verify_peer,i.verify_hos t,i.allow_traps,i.templateid,id.parent_itemid from items i inner join hosts h on i.hostid=h.hostid left join item_discovery id on i.itemid=id.itemid join item_rtdata ir on i.itemid=ir.itemid where h.status in (0,1) and i.flags<>2;
                    QUERY PLAN
                    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    Hash Join (cost=110.35..108569.35 rows=595249 width=252) (actual time=0.755..2047.580 rows=683653 loops=1)
                    Hash Cond: (i.hostid = h.hostid)
                    -> Merge Left Join (cost=1.54..106733.79 rows=656006 width=252) (actual time=0.061..1757.813 rows=683653 loops=1)
                    Merge Cond: (i.itemid = id.itemid)
                    -> Merge Join (cost=1.11..69969.33 rows=656006 width=244) (actual time=0.034..1023.994 rows=683653 loops=1)
                    Merge Cond: (i.itemid = ir.itemid)
                    -> Index Scan using items_pkey on items i (cost=0.42..43460.18 rows=686284 width=230) (actual time=0.013..357.460 rows=685703 loops=1)
                    Filter: (flags <> 2)
                    Rows Removed by Filter: 33354
                    -> Index Scan using item_rtdata_pkey on item_rtdata ir (cost=0.42..17714.01 rows=689695 width=22) (actual time=0.007..132.160 rows=683653 loops=1)
                    -> Index Only Scan using item_discovery_1 on item_discovery id (cost=0.42..29648.88 rows=704873 width=16) (actual time=0.024..357.422 rows=704873 loops=1)
                    Heap Fetches: 699022
                    -> Hash (cost=91.30..91.30 rows=1401 width=8) (actual time=0.670..0.670 rows=1401 loops=1)
                    Buckets: 2048 Batches: 1 Memory Usage: 71kB
                    -> Seq Scan on hosts h (cost=0.00..91.30 rows=1401 width=8) (actual time=0.011..0.429 rows=1401 loops=1)
                    Filter: (status = ANY ('{0,1}'::integer[]))
                    Rows Removed by Filter: 143
                    Planning Time: 2.355 ms
                    Execution Time: 2074.530 ms

                    Comment

                    • user.zabbix
                      Junior Member
                      • Feb 2020
                      • 25

                      #25
                      we use only active snmp check

                      Comment

                      • user.zabbix
                        Junior Member
                        • Feb 2020
                        • 25

                        #26
                        Number of processed not supported values per second
                        26.02.2020 15:42:58 130.4489

                        Notsupported items 59020
                        Last edited by user.zabbix; 26-02-2020, 15:50.

                        Comment

                        • user.zabbix
                          Junior Member
                          • Feb 2020
                          • 25

                          #27
                          Zabbix queue grow by items that are waiting for a refresh -- and what to do with these ?

                          Comment

                          • Hamardaban
                            Senior Member
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • May 2019
                            • 2713

                            #28
                            The queue for snmp requests is the difference between the number of requested values for data items and the number of responses received. Large queue = data asked a lot \ received a little.
                            Why is that? So we're trying to figure it out..... The network may be poorly designed, or the network stack may be overloaded on the server itself.
                            Here are the specific steps that I advise you to do:
                            1) go to work with the database via sockets
                            2) remove housekeeping
                            3) increase the number of process poller and Unreachable poller
                            4) try to transfer some devices to work through a proxy

                            Comment

                            • user.zabbix
                              Junior Member
                              • Feb 2020
                              • 25

                              #29

                              thanks, we will perform tuning

                              Comment

                              • mbuyukkarakas
                                Member
                                • Aug 2014
                                • 49

                                #30
                                Hello all,

                                A few months ago I installed a large set of zabbix servers (3 apps with zabbix v5.0, 1 database server with postgresql v11 and 3 proxies.)
                                The system runs generally with 500 nvps and the proxies with almost 70/80 nvps.
                                Unfortunately I have almost the same problem. The proxy queues are growing and I cant find the reason.
                                For each of the proxies at least 1000 to 2000 waiting items are showing up.


                                Attached Files

                                Comment

                                Working...