Ad Widget

Collapse

Daylight Savings Time Zabbix Crash

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ssims1
    Junior Member
    • Nov 2016
    • 6

    #1

    Daylight Savings Time Zabbix Crash

    Anyone experience an issue on Nov 6 with Zabbix handling DST? For some reason as soon as the hour rolled back, our zabbix server began sending false positives alerts for some 65000+ items, thus causing a major backlog in the queue and generating thousands of emails to all IT departments. Needless to say, it was quite a major headache. Any ideas on what caused this or how to fix it?
    I reviewed every log available and no warnings or errors appear on either the mysql database server or the zabbix application server.
    Versions of what we use are below:
    zabbix-agent.x86_64 3.0.5-1.el7 zabbix
    zabbix-get.x86_64 3.0.5-1.el7 zabbix
    zabbix-server-mysql.x86_64 3.0.5-1.el7 zabbix
    zabbix-web.noarch 3.0.5-1.el7 zabbix
    zabbix-web-mysql.noarch 3.0.5-1.el7 zabbix
    zabbix-server-mysql.x86_64 3.0.5-1.el7 zabbix
    mysql-community-client.x86_64 5.7.16-1.el7 mysql57-community
    mysql-community-common.x86_64 5.7.16-1.el7 mysql57-community
    mysql-community-libs.x86_64 5.7.16-1.el7 mysql57-community
    mysql-community-libs-compat.x86_64 5.7.16-1.el7 mysql57-community
    mysql-community-server.x86_64 5.7.16-1.el7 mysql57-community
  • nick0909
    Member
    • Apr 2013
    • 73

    #2
    No problems here. Zabbix server 3.2.1 with zabbix proxies on 3.0.4, most agents are 3.2, with some 3.0 and 2.4 still hanging around.

    Comment

    • ssims1
      Junior Member
      • Nov 2016
      • 6

      #3
      The large majority of our agents are 3.0.*, while our 4 proxy servers are 3.0.4 along with the zabbix server. As soon as DST kicked in, the zabbix server was unable to communicate to it's own zabbix agent and internal checks, along with everything else that it monitors.

      Comment

      • glebs.ivanovskis
        Senior Member
        • Jul 2015
        • 237

        #4
        Log monitoring on Windows + NTFS?

        Comment

        • ssims1
          Junior Member
          • Nov 2016
          • 6

          #5
          Windows monitoring includes windows 2008-2016 servers, sql server, etc. Checks include disk space used/free, etc.
          Environment is a mix of windows/linux vm's and physical servers.
          Our zabbix app server, database server, and proxy servers all run CentOS Linux release 7.2.1511 (Core). The same ntp timing sources are used by all servers in our environment.

          Comment

          • glebs.ivanovskis
            Senior Member
            • Jul 2015
            • 237

            #6
            And what kind of triggers were firing? Can you give expression example?

            Comment

            • glebs.ivanovskis
              Senior Member
              • Jul 2015
              • 237

              #7
              During DST transitions Windows messes up with hardware clock (https://blogs.msdn.microsoft.com/old...02-00/?p=37983) which can screw up localtime->unixtime and unixtime->localtime conversions which in turn can delay active checks for an hour and make nodata() go crazy.
              Last edited by glebs.ivanovskis; 08-11-2016, 11:21.

              Comment

              • ssims1
                Junior Member
                • Nov 2016
                • 6

                #8
                So our active checks worked fine, but for 45 minutes after DST we had no data.
                Triggers that got fired off would be like the example below:
                Zabbix agent on *server_name* is unreachable for 5 minutes: PROBLEM Last value: 1
                Normally I would think that the connection was lost between the mysql database and the zabbix server, but there's nothing in the logs that shows any events correlating to this happening. Housekeeping occurred twice on the database due to the rollback in time, not sure if this is where the issue may lie.

                Basically for 45 minutes after DST, Zabbix was unable to collect data from everything, including it's own agent and internal checks, and fired off the corresponding triggers/actions based on the failures.

                Since advanced logging was not turned on, I'm concerned that the issue won't be found and may re-occur again during next DST.

                Comment

                • ssims1
                  Junior Member
                  • Nov 2016
                  • 6

                  #9
                  DST false alerts issue

                  Once again, the same issue occurred during daylight savings times. Hundreds of false alerts cropped up as soon as the hour rolled back.
                  Is there a fix for this in a newer version of zabbix? We currently run Zabbix server 3.0.4.

                  Comment

                  • kloczek
                    Senior Member
                    • Jun 2006
                    • 1771

                    #10
                    Originally posted by ssims1
                    Once again, the same issue occurred during daylight savings times. Hundreds of false alerts cropped up as soon as the hour rolled back.
                    Is there a fix for this in a newer version of zabbix? We currently run Zabbix server 3.0.4.
                    Nothing in zabbix is using DTS.
                    All Unixes should use UTC + TZ settings (Time Zone).
                    On daylight savings time changes UTC still is the same.
                    Whatever happened in your case it was quite possible caused by manual manipulation of the local time settings.

                    Noe one can help you as long as you will not describe in details what happened in your case.
                    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                    https://kloczek.wordpress.com/
                    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                    My zabbix templates https://github.com/kloczek/zabbix-templates

                    Comment

                    • ssims1
                      Junior Member
                      • Nov 2016
                      • 6

                      #11
                      All Unixes should use UTC + TZ settings (Time Zone).

                      I described in detail in my earlier posts in regards to the issue. There was no response. However what you posted may help.

                      So for all zabbix servers, proxies, and databases, the linux OS they run on needs to be using UTC time zones correct?

                      Comment

                      • kloczek
                        Senior Member
                        • Jun 2006
                        • 1771

                        #12
                        Originally posted by ssims1
                        I described in detail in my earlier posts in regards to the issue. There was no response. However what you posted may help.

                        So for all zabbix servers, proxies, and databases, the linux OS they run on needs to be using UTC time zones correct?
                        Sorry but I don't see any details which could be used on identify your issue.
                        Please provide OS used on server, db backend and monitored hosts, and details about the issue.
                        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                        https://kloczek.wordpress.com/
                        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                        My zabbix templates https://github.com/kloczek/zabbix-templates

                        Comment

                        • kaspars.mednis
                          Senior Member
                          Zabbix Certified Trainer
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Oct 2017
                          • 349

                          #13
                          It's a known bug, will be fixed, there is as ZBX-bugreport



                          Regards,
                          Kaspars

                          Comment

                          • Rhobro
                            Junior Member
                            • Jun 2017
                            • 9

                            #14
                            I have a similar problem.

                            I also thought that everything is out of date because my zabbix agents triggers "no connect".

                            I checked that and saw that only system time and zabbix agent ping and system time were throwing triggers, the agent was actually still sending data.


                            The problem was that a few items, mostly but not only agent ping and system time have timestamps several hours ahead of the current time.

                            Does your "no data" Problem apply to every item?
                            Please check the "last check" date


                            If your problem really is the same as mine, this will confirm that this bug also happened to us.
                            According to kaspar, this wont be resolved in 3.4.4, I guess

                            Comment

                            Working...