Ad Widget

Collapse

[Zabbix 2.2.3] History Syncer hangs using 100%

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lynggaard
    Junior Member
    • Sep 2012
    • 11

    #1

    [Zabbix 2.2.3] History Syncer hangs using 100%

    Hi

    I am having a problem with Zabbix 2.2.3, namely that is doesn't appear to be processing all values because one History syncer iis hanging with 100% CPU

    It looks similar to https://support.zabbix.com/browse/ZBX-7725 in symptoms but I am not sure how to troubleshoot it.

    My Setup is
    Zabbix 2.2.3 on RHEL 6 with around 80 Values per second.
    Oracle 11R2 on RHEL 6 as backend

    My process output looks like this:
    zabbix 16120 16092 0 16:42 ? 00:00:00 zabbix_server: history syncer #4 [synced 1 items in 0.006565 sec, syncing history]
    zabbix 16121 16092 99 16:42 ? 01:54:52 zabbix_server: history syncer #5 [synced 0 items in 0.000029 sec, syncing history]
    zabbix 16122 16092 0 16:42 ? 00:00:00 zabbix_server: history syncer #6 [synced 0 items in 0.001868 sec, syncing history]

    as you can see one of the threads are consuming 100% CPU while all the others are sitting at 0% CPU and syncing history

    I have checked the queue and it appears it is the same 200+ values which are stuck in the queue, some of them for several hours i.e. the oldest entries are not being processed. It queue depth doesn't seem to increase på 80 values a sec, but that is most likely because the agengt are reporting connection failures

    I have checked the database and there is no load on it at all. Load average shows 0.1 or so.

    I have tried increasing the various cache sizes from default to 256M, and the DBsyncers up to 32 and now back to 16, but it was also a problem with the default value of 4.

    Does someone have some pointers in how to overcome this issue? or what to look for in the Logs (I have enabled DEbug level 4 and Log slow queries)
  • pupkin.ivan
    Member
    • Aug 2013
    • 51

    #2
    I have same problem and cannot find any solutions. Any ideas?

    Comment

    • ingus.vilnis
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Mar 2014
      • 908

      #3
      Hello!

      What do you see in Monitoring -> Graphs -> Choose your Zabbix server -> Zabbix internal process busy?

      Is the history syncer at 100% busy too?

      Judging by the post amount in forum I see that you guys are no beginners in Zabbix, so don't find my suggestion offensive (I am simply troubleshooting from very beginning), but are you sure you uncomment the setting lines or write new ones in zabbix_server.conf file correctly, as well as correctly restart zabbix_server process afterwards?

      Does the zabbix_server.log show anything suspicious?

      Best Regards,
      Ingus

      Comment

      • pupkin.ivan
        Member
        • Aug 2013
        • 51

        #4
        i wrote all my research process in thread https://www.zabbix.com/forum/showthread.php?p=150770

        At now, in 10:00 "timer" process again stops entire zabbix service via zabbix_server history syncer process with 100% CPU consuming:


        Last line in log at 6:41

        sorry for my bad english.

        Comment

        • ingus.vilnis
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Mar 2014
          • 908

          #5
          Hi,

          I briefly checked your research process but I can't see the problem yet.

          What do you have for StartTimers= parameter in your zabbix_server.conf file?

          Best Regards,
          Ingus

          Comment

          • pupkin.ivan
            Member
            • Aug 2013
            • 51

            #6
            my zabbix_server.conf contents without comments:
            ListenPort=10051
            LogFile=/var/log/zabbix/zabbix_server.log
            LogFileSize=100
            DebugLevel=4
            PidFile=/var/run/zabbix/zabbix_server.pid
            DBHost=localhost
            DBName=zabbix
            DBUser=zabbix
            DBPassword=pass
            DBSocket=/var/run/mysqld/mysqld.sock
            StartPollers=150
            StartPollersUnreachable=50
            StartTrappers=5
            StartPingers=50
            StartDiscoverers=20
            StartHTTPPollers=20
            MaxHousekeeperDelete=5000
            CacheSize=64M
            StartDBSyncers=50
            HistoryCacheSize=64M
            TrendCacheSize=64M
            HistoryTextCacheSize=64M
            ValueCacheSize=32M
            AlertScriptsPath=/usr/lib/zabbix/alertscripts
            ExternalScripts=/usr/lib/zabbix/externalscripts
            FpingLocation=/usr/bin/fping
            Fping6Location=/usr/bin/fping6

            Comment

            • ingus.vilnis
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Mar 2014
              • 908

              #7
              Try to add the following line for a start:
              Code:
              StartTimers=10

              Comment

              • LenR
                Senior Member
                • Sep 2009
                • 1005

                #8
                Here are my thoughts....

                When we have to restart our zabbix server, it seems to take it awhile to "catch up", hist sync will be 100% until then. This started to deteriorate as we added more items and triggers. After some hardware changes (this is a test lab) it got to the point where it would never catch up.

                I noticed many mysql select from history_uint processes and wondered why they weren't using the value cache. I started disabling triggers yesterday and it caught up overnight.

                I think there is a race condition between the history sync and things checking the value cache. From the doc, it says if the value isn't cached, it will go to the database. If zabbix is looking for a current value, but history sync is behind, that value won't be in the cache. This will create MANY selects for history which will delay history sync even more.

                This environment:

                Zabbix 2.2.3, 7000 hosts, 1M items, 144K triggers, 37K nvps.
                ESX 5 hosted RHEL 6.5, 20 proc, 64G ram, server & mysql on same VM, web and 7 proxies on other VM's.

                Comment

                • pupkin.ivan
                  Member
                  • Aug 2013
                  • 51

                  #9
                  2,5 days after setting "starttimers=10" service is working fine. Today at 14:30 service crached again, but process timer worked fine: https://www.zabbix.com/forum/showthr...889#post150889

                  Comment

                  • ingus.vilnis
                    Senior Member
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Mar 2014
                    • 908

                    #10
                    Hi!

                    Please check your server log. The log you attached to the RU forum starts at 14:43 but you had your crash at 14:30 so no useful information about your crash there.

                    Situation in screenshots could be better but not much ideas right now about them.

                    Best Regards,
                    Ingus

                    Comment

                    • ingus.vilnis
                      Senior Member
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Mar 2014
                      • 908

                      #11
                      Not directly related to crash but a suggestion. Add more Pollers and ValueCache in zabbix_server.conf

                      Code:
                      StartPollers=180
                      ValueCacheSize=64M

                      Comment

                      • pupkin.ivan
                        Member
                        • Aug 2013
                        • 51

                        #12
                        some new data:

                        Today at 17:25 we have another crash.

                        Detailed log for bad history syncer process: https://dl.dropboxusercontent.com/u/...d15413.log.bz2

                        Latest query is successfull in mysql: https://dl.dropboxusercontent.com/u/...last_query.txt

                        Bad data from service tables (mysqldump -u zabbix -p zabbix services services_links services_times service_alarms >services.sql): https://dl.dropboxusercontent.com/u/...rvices.sql.bz2

                        Comment

                        • pupkin.ivan
                          Member
                          • Aug 2013
                          • 51

                          #13
                          my problem is https://support.zabbix.com/browse/ZBX-8181

                          Comment

                          • ingus.vilnis
                            Senior Member
                            Zabbix Certified Trainer
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Mar 2014
                            • 908

                            #14
                            Originally posted by pupkin.ivan
                            Hi!

                            Regarding ZBX-8181 seems only update to the latest 2.2.4rc2 will help you then. (http://www.zabbix.com/download.php).

                            Or you can wait until the stable 2.2.4 release comes available.

                            Best Regards,
                            Ingus

                            Comment

                            Working...