Ad Widget

Collapse

Zabbix history syncer process high load

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zabbix_user2012
    Junior Member
    • Aug 2012
    • 13

    #1

    Zabbix history syncer process high load

    Hi all,

    Hoping someone can help with this issue. I did a search in the forums for "history syncer" and did not find anything helpful.
    We started to encounter a weird issue with one of the Zabbix internal processes. The zabbix history syncer process started to go above 50% and at times spikes all the way to 100%. We notice that when that happens, the zabbix queue backs up immensely. We also notice that the zabbix server and zabbix db server are barely breaking a sweat in terms of load average and mem usage. They are on blade servers with 16 cpu cores 16/32 gigs of ram etc.

    We tried bumping up the cache sizes, different poller starter counts. It has helped but not as much as wel want.

    We are running Zabbix 2.0.3 and postgres 9.1.4.

    Attached is the zabbix_server and postgres configuration files.

    Thanks for reading!
    Attached Files
  • zabbix_user2012
    Junior Member
    • Aug 2012
    • 13

    #2
    Anybody can help?

    Comment

    • lbl@unoc.dk
      Junior Member
      • Oct 2012
      • 6

      #3
      Im shooting out in the blue here.

      Without knowing what im talking about i would try to set the:

      StartDBSyncers=32

      And see if something changes ... you might have to change the numbers of allowed connections/sessions on the database to support the extra connections.

      If this dosent work you should monitor your disk OPS (Operations Pr sec) maybe its capping out so you will need faster disks on your database server.

      /lbl

      Comment

      • tsalle
        Member
        Zabbix Certified Specialist
        • Oct 2012
        • 79

        #4
        same problem here

        Hi,

        I've got the same probleme here !
        We're using Zabbix 1.8.11 and, since we have reached ~ 500 vps the history sync process is 100% busy almost all the time

        I've tried to set the Start DBSyncer to 100, but after that all "insert into history_uint" queries were very slow !
        So I've tried to custom MySQL with the help of tuning-primer.sh , but same problem..

        I set the "LogSlowQueries" parameters to 60000 (1 min) in Zabbix server conf and now i've a lot of "UPDATE items set lastclock=..." query in zabbix server log.

        I don't know what to do to accelerate my database insert / upadte queries, so if someone have an idea


        ---
        Zabbix Server 1.8.11
        MySQL server 5.5 on remote server with history_uint daily partitioned (HouseKeeper disabled)

        Comment

        • dimi
          Junior Member
          Zabbix Certified Trainer

          • Nov 2004
          • 13

          #5
          same problem

          Hi
          I have found the same issue, i can't enhance the DBSync (actualy i have =4) some one have do much more testing? I have attache the my Zabbix Performance Screen, is easy to see the effect of History Syncer on Zabbix queue.

          Thanks in advance for any help


          Click image for larger version

Name:	zabbix_history_syncer.jpg
Views:	1
Size:	89.4 KB
ID:	311974

          Comment

          • zabbix_user2012
            Junior Member
            • Aug 2012
            • 13

            #6
            right now we are trying to prepare a change to partition the db. Hopefully that will eliminate the slow queries. As for the queue backup and and the busy zabbix internal processes, we have no idea. We have tweaked all settings in the zabbix config file with minimal effect....

            Comment

            • fpaternot
              Member
              Zabbix Certified Specialist
              • Feb 2013
              • 52

              #7
              Did that help?

              I just hit the same issue, after adding 4 routers to zabbix hosts.

              Comment

              • alanxge
                Junior Member
                • Aug 2013
                • 2

                #8
                Running into the same issue also

                We are still running 1.8.7, and experimenting with 2.x now. The issue we are having is with 1.8.7.

                We are running into the same issue with zabbix, basically the history syncer process pegs at 100% and the history write cache drops slowly from 100% all the way to 0%.

                Basically when the history syncer process pegs at 100%, no more data seems to go into the history table, and Zabbix complains about slow queries inserting into the history table.

                This only seems to happen when the history table size go above roughly 30 million rows for our server.

                The current theory is that as the table grows, inserting into the history table gets slower, the history syncer processes are not being able to keep up with the poller processes (the data gathering processes), thus causing the syncer process to peg at 100% and history cache gets used up.

                Could someone share their experiences and fixes, so that we can either confirm or invalidate this theory?

                Thanks a lot.

                Comment

                • fpaternot
                  Member
                  Zabbix Certified Specialist
                  • Feb 2013
                  • 52

                  #9
                  Well, i got that fixed and had to do a few things for that to work ok. I'll list them with no particular order, but all of them helped me with the issue. I'm running 2.0.4 as server, 2.0.7 as proxys, 2.0.6 for most of agents.;

                  - collect everything through proxy. it dosent matter if you think you dont need them (remote proxy for ex), but helps greatly with server/db load
                  - low number of concurrent db connections (using 8)
                  - minimal hystory time (i'm using 7 days/456 for trends)!!! <- huge impact
                  - good cache settings, enough for couple hours of operation without db running
                  - enable housekeeping or partition your database


                  To enable my housekeeping proccess, i had to drop the hole history_* tables as they were too large for it to complete (it ran for two weeks before i did this).

                  Everything is running ok now, the DB size reduced from 200+G (and growing) to 40G (stable) with very little resources so far. Response times reduced greatly as well.


                  Hope it helps others

                  Comment

                  • alanxge
                    Junior Member
                    • Aug 2013
                    • 2

                    #10
                    fpaternot , thanks a lot for the tips

                    We will be trying to test out different configurations of zabbix, especially using proxies and separating the DB from the data gathering jobs.

                    Comment

                    • harmonica
                      Senior Member
                      • Jan 2009
                      • 251

                      #11
                      Hi all,

                      Hoping someone can help me with this issue. We started to encounter a weird issue with this Zabbix internal processes. The zabbix history syncer process started to go above 40% and we notice that when that happen there some erros in zabbix_server.log

                      30179:20140118:200410.463 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199425;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199426;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199427;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199428;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199429;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199430;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199431;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199432;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199433;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199434;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199435;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199436;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199437;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199438;
                      update items set description='The total number of octets transmitted out of the
                      interface, including framing characters. This object is a 32-bit version.' where itemid=199439;
                      ]

                      When the housekeeper process starts running and goes to 100% all other internal process increases to 50 - 60% and remain for hours in this state. The the queue begins to grow and the access to frontend takes to long. Restart de zabbix-server process in not the solution and we have to reboot the server. Our configurations

                      zabbix_server.conf
                      StartPollers=50
                      StartPollersUnreachable=50
                      StartTrappers=10
                      StartPingers=25
                      CacheSize=128M
                      StartDBSyncers=8
                      HistoryCacheSize=16M
                      TrendCacheSize=8M
                      HistoryTextCacheSize=32M

                      Numer hosts: 706
                      Number items: 33071
                      Values per second: 175.31


                      Regards,

                      Comment

                      • tsalle
                        Member
                        Zabbix Certified Specialist
                        • Oct 2012
                        • 79

                        #12
                        Originally posted by harmonica
                        When the housekeeper process starts running and goes to 100% all other internal process increases to 50 - 60% and remain for hours in this state. The the queue begins to grow and the access to frontend takes to long.
                        Hi,

                        You should use MySQL partitionning and disable the house keeper process.
                        See http://zabbixzone.com/zabbix/partitioning-tables/ to get help on tables partitionning.
                        If youre using Zabbix 2+ you should update the query according to the new tables schema.

                        If youre tables are heavy, the alter table query can be very (very very) long.
                        The best way to use partitionning is to create partitionned tables at the initial zabbix installation.

                        Hope this will help.

                        Thierry.

                        Comment

                        • harmonica
                          Senior Member
                          • Jan 2009
                          • 251

                          #13
                          Originally posted by tsalle
                          Hi,

                          You should use MySQL partitionning and disable the house keeper process.
                          See http://zabbixzone.com/zabbix/partitioning-tables/ to get help on tables partitionning.
                          If youre using Zabbix 2+ you should update the query according to the new tables schema.

                          If youre tables are heavy, the alter table query can be very (very very) long.
                          The best way to use partitionning is to create partitionned tables at the initial zabbix installation.

                          Hope this will help.

                          Thierry.
                          Hi Thierry,

                          The housekeeper process only takes 4 minutes to clear 600000 records from history and trends.

                          2640:20140120:133738.810 executing housekeeper
                          2640:20140120:134057.259 housekeeper deleted: 617768 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions

                          When history syncer process goes above 40% after the errors on zabbix_server.log the next housekeeper cycle process starts running and goes to 100% and all the other internal process increases to 50 - 60% and remain for hours in this state

                          Regards
                          Last edited by harmonica; 20-01-2014, 16:26.

                          Comment

                          • fpaternot
                            Member
                            Zabbix Certified Specialist
                            • Feb 2013
                            • 52

                            #14
                            Harmonica,

                            I've hit a similar problem just a while ago. At that time, i had my tmptable (my.cnf) running on ram with just 2G as max size. Since i moved it to disk i lost some performance but its far more stable now.

                            Could you check that?

                            Comment

                            • harmonica
                              Senior Member
                              • Jan 2009
                              • 251

                              #15
                              Originally posted by fpaternot
                              Harmonica,

                              I've hit a similar problem just a while ago. At that time, i had my tmptable (my.cnf) running on ram with just 2G as max size. Since i moved it to disk i lost some performance but its far more stable now.

                              Could you check that?
                              Hi,

                              Actually our variable is set to tmp_table_size = 32M

                              Comment

                              Working...