Ad Widget

Collapse

Zabbix Upgrade 3.2 -> 3.4.2: Several problems/Item values have wrong timestamps

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Rhobro
    Junior Member
    • Jun 2017
    • 9

    #1

    Zabbix Upgrade 3.2 -> 3.4.2: Several problems/Item values have wrong timestamps

    I recently upgrade my zabbix-server from version 3.2 to version 3.4 using the zabbix debian repositories.

    At first glance, everything looked nice after the database was converted (which took longer than I wanted to stay in my office)

    But we slowly started to notice that the performance was getting worse and worse.

    Then several hosts were throwing "System Time Desync" Triggers.
    After checking how this could happen, I saw that almost every itemvalue had a timestamp that was at least 2 hours ahead in the future.

    Now, this applies to every server.
    We have about 700 different hosts in Zabbix and my zabbix-server logfile stays silent now.
    Until the actual time is ahead of the wrong timestamp again.

    I am using MariaDb as MySQL database with InnoDB as Engine.
    In my zabbix-server.log File I noticed a lot of "slow query" warnings.

    I dont really know why this is happening, but I really would like to use 3.4 as it adds some neat features like the new dashboard or acknowledge-actions.

    I dont really know what I should do now, my last resort is to roll back to a snapshot I made right before the upgrade




    This is a system-time value we are monitoring on one of our servers.
    Its a pretty good example, because you can see that at 9:39, the last check value was set to 11:39.

    It was 10:40 at the time this screenshot was taken.
    Zabbix Server Time is as it should be.
    Database Time also gives the right now().
    Host Time is obviously correct aswell.
    Last edited by Rhobro; 06-11-2017, 11:11.
  • Rhobro
    Junior Member
    • Jun 2017
    • 9

    #2
    I thought the problem was gone now, but apparently after the last weekend it started all over again.

    The zabbix Queue (I guess that's how many items the system is processing right now) is also really big again.
    Neither RAM or CPU are at their peak.
    Last edited by Rhobro; 01-11-2017, 15:21.

    Comment

    • kaspars.mednis
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2017
      • 349

      #3
      Here is some simple configuration troubleshooter to look first

      its for 1.8, but the same graphs, paramaters are for 3.4

      does all your graphs look ok? Maybe some caches are full or some processes 100% busy ?

      Regards,
      Kaspars
      Last edited by kaspars.mednis; 01-11-2017, 15:30.

      Comment

      • Palmertree
        Senior Member
        • Sep 2005
        • 746

        #4
        Make sure you have the following set in your /etc/my.cnf file to prevent slow queries for history.

        # Optimizer Switches
        optimizer_switch='index_condition_pushdown=off'

        Comment

        • Rhobro
          Junior Member
          • Jun 2017
          • 9

          #5
          Originally posted by kaspars.mednis
          Here is some simple configuration troubleshooter to look first

          its for 1.8, but the same graphs, paramaters are for 3.4

          does all your graphs look ok? Maybe some caches are full or some processes 100% busy ?

          Regards,
          Kaspars




          Green=Processed items
          Red=Zabbix Queue Length


          IMO, these graphs don't look that bad
          Cache-Usage is also okay, maximum at 20% usage.
          The server has 32GB ram, theres plenty of room if zabbix directly utilizes it

          I still think it could be a problem with my MariaDB installation, introduced with the new upgrade

          At the Moment, its 09:58 and Zabbix already has monitored values for 11:07.
          We're already calling it the "time machine", funny as it is - we can't properly monitor our network if random items are stuck at a certain value

          Originally posted by Palmertree
          # Optimizer Switches
          optimizer_switch='index_condition_pushdown=off'
          How exactly do I benefit from this?
          Last edited by Rhobro; 06-11-2017, 11:03.

          Comment

          • kaspars.mednis
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Oct 2017
            • 349

            #6
            Your zabbix performance graphs looks almost perfect !

            After checking how this could happen, I almost every itemvalue had a timestamp that was at least 2 hours ahead in the future.
            All your values are 2 hours ahead ?
            The time on zabbix server/zabbix DB is OK ?

            do you have correct timezone in php config for zabbix-web?
            Code:
            PHP option "date.timezone"
            Regards,
            Kaspats

            Comment

            • Rhobro
              Junior Member
              • Jun 2017
              • 9

              #7
              Originally posted by kaspars.mednis
              All your values are 2 hours ahead ?
              The time on zabbix server/zabbix DB is OK ?
              I've already confirmed that every server/service has the correct time configured.

              The future values are not just a display error, they are also wrong inside the database.

              I already had an expert look at the problem, though he could just verify the problem.

              The output is a bit dated now, but the problem is the same.

              Code:
              MariaDB [zabbix]> select FROM_UNIXTIME(clock) as t,value from history_uint where itemid=652974 order by clock desc limit 5;
              +---------------------+-------------+
              | t                   | value       |
              +---------------------+-------------+
              | 2017-10-18 17:13:49 | 34358743040 |
              | 2017-10-18 15:12:54 | 34358743040 |
              | 2017-10-18 15:12:04 | 34358743040 |
              | 2017-10-18 15:11:14 | 34358743040 |
              | 2017-10-18 15:10:24 | 34358743040 |
              +---------------------+-------------+
              5 rows in set (0.01 sec)
              
              MariaDB [zabbix]> select now();
              +---------------------+
              | now()               |
              +---------------------+
              | 2017-10-18 16:04:19 |
              +---------------------+
              1 row in set (0.00 sec)

              I've said that random values get monitored with the wrong timestamp, but after checking again right now, I noticed that its almost always "system.localtime" and "agent.ping".
              Last edited by Rhobro; 06-11-2017, 11:37.

              Comment

              • Rhobro
                Junior Member
                • Jun 2017
                • 9

                #8
                I think that the item queue is getting processed way to slow.

                Queued On-Trigger-Actions seem to be processing very slow aswell.


                I've looked into my log files to find -anything- and stumbled upon this:

                Code:
                1534:20171108:113645.378 slow query: 7228.560935 sec, "select ip.item_preprocid,ip.itemid,ip.step,ip.type,ip.params from item_discovery id join item_preproc ip on id.itemid=ip.itemid where (id.parent_itemid between 558518 and 558529 or id.parent_itemid in (592501,598951,599474))"
                Is this seriously a 2 hour long query?


                From what I've read, Switching optimizer_switch='index_condition_pushdown=off' would only makes it worse

                Comment

                • kaspars.mednis
                  Senior Member
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2017
                  • 349

                  #9
                  If the item queue is getting processed way to slow, there must be some indications in Zabbix internal processes busy graph.

                  The history syncer process is responsible for syncing history tables, with too much values waiting to be written in DB, it will go very busy and the history cache will go full (there is another graph in zabbix which shows all caches)

                  P.S

                  Are all your tables using INNODB ? in some rare cases there can be some tables left from previous upgrades with MyISAM engine

                  Regards,
                  Kaspars

                  Comment

                  • Rhobro
                    Junior Member
                    • Jun 2017
                    • 9

                    #10
                    In an attempt to make Zabbix faster, we had an specialist perform a major overhaul of our zabbix server.
                    The whole database was also created from scratch again, with every table using InnoDB
                    This happened in early 2017.

                    I went further than 1 day with this graph now

                    Comment

                    • Rhobro
                      Junior Member
                      • Jun 2017
                      • 9

                      #11
                      Is it recommended to report this as a bug?

                      This problem still persists, sometimes every item on every server has its lastcheck value 2 hours ahead.

                      Comment

                      Working...