Ad Widget

Collapse

Zabbix agent ping timestamp

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jangi
    Junior Member
    • May 2016
    • 16

    #1

    Zabbix agent ping timestamp

    Short:
    The timestamp used for the clock field on the agent.ping item is the clock on the Zabbix Agent host, not the timestamp that it was received on the server. This is an issue because not all clients can be trusted to have an accurate clock. We monitor hundreds of 3rd party systems where we have no control over that. Several customers block NTP on their firewall. Clocks drift. Etc.

    Long:
    Ever since upgrading to Zabbix 4.0 I've had issues where some hosts start firing the unavailable trigger, constantly flapping between Problem and Ok. Until today it seemed totally random; sometimes it was hosts on older versions of the agent, sometimes with 4.0 agents, different OSes, etc. There is never anything in the agent log file to indicate an issue communicating with the server. I should point out that we use Active agents exclusively. Anyway, sick of the thousands of inaccurate emails that suddenly flood my inbox I was determined to find out why this was happening. After much debug log reading and digging around in mysql while data was coming in I realized that the clock field on the latest agent.ping item in the history_uint table was a timestamp about 5 minutes old, even though it had just arrived seconds earlier. Sure enough the system clock on the client machine was off by five minutes. Unfortunately NTP is not an option so I manually corrected the clock and hope it will stay.

    I'm guessing this is because we're using active agents? I don't have any passive agents to compare to. I'm not sure why this issue seemed to start happening with the upgrade to 4.0; was the timestamp source changed?
  • jangi
    Junior Member
    • May 2016
    • 16

    #2
    Yikes... so apparently I should've done more research before posting this. After reading thru https://support.zabbix.com/browse/ZBX-15301 I understand the problem.

    Whatever fixes/band-aids were applied in 4.0.2/4.0.4 for various related issues discussed in the thread above and others my underlying issue still exists. I'd like to make a few points (from my very subjective point of view):

    1) While I understand the requirement that proxies need to have valid synchronized time, this is an unreasonable requirement for active agents.
    2) The main reason to use active agents in place of passive has to do with networking; it's sometimes very difficult (100% of the time in our case) for the server to be able to poll hosts. They are all behind 3rd party firewalls. We need an agent to "call home".
    3) Even if we accept the fact that all the items will have their values timestamped by the agent, and it's potentially wrong local clock (which I definitely do not), the primary way to tell if an active agent is online is agent.ping. If you want a trigger to alert you if a host is offline for more than 5 minutes how can you do that reliably when the data and the server are comparing two different clocks?

    Comment

    • dimir
      Zabbix developer
      • Apr 2011
      • 1080

      #3
      I think what you are looking for is this: https://support.zabbix.com/browse/ZBX-12957

      where the following change was introduced:

      "Zabbix server will no longer adjust value timestamps in cases when Zabbix proxy/active agent/sender time differs from Zabbix server time."

      And that introduced a regression when monitoring the time difference using fuzzytime() and active agent:

      Comment

      • jangi
        Junior Member
        • May 2016
        • 16

        #4
        Hi dmir,

        Thanks for the response. I'm not sure if you were responding to my first or second post. I do understand the change that was introduced and that was one of the topics I read as well. Is there a current open issue or feature request for getting this changed in a future release? As I said; requiring proxies to have valid time I can understand; they are a server with potentially thousands of hosts/items to check. And I also understand from a development perspective that active agents and proxies are similar.

        But I really don't believe it is reasonable to expect agent hosts to have valid time. First, that just isn't always the case and not always under our control. Second, what if something goes wrong? Internal NTP server goes down, someone fat-fingers a firewall policy. Maybe it's intentional; a development server where they constantly change the time when debugging a product. It is not intuitive (from a user perspective) to have passive and active agents function differently. The whole separate item templates thing has been a thorn in our side for years. Not having remote command capability stinks sometimes. But the timestamp issue really makes active agents a 2nd class citizen. I'm actually considering installing ssh with the agent and establishing a permanent ssh tunnel just so passive checks would work. Which seems insane.

        I'm not suggesting that Zabbix tries to be NTP... just that it also records the server timestamp on received data and that we're able to trigger on that instead of the host's time.

        James

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Originally posted by jangi
          Short:
          The timestamp used for the clock field on the agent.ping item is the clock on the Zabbix Agent host, not the timestamp that it was received on the server. This is an issue because not all clients can be trusted to have an accurate clock. We monitor hundreds of 3rd party systems where we have no control over that. Several customers block NTP on their firewall. Clocks drift. Etc.
          Control question: do you want to sample host (with running agent) local time or sample when exact point of this metric has arrived to the server?
          In some scenarios difference between those two time stamps can be up to .. 24h (when proxy collects data from the agents it can be disconnected with server an proxy van hold up to 24h those data before it will start discarding oldest one).
          You can build/buy your own time synchronisation stratum class using GPS signal to serve reference time in isolated network. Cheapest devices should be possible to buy for ~100$.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • jangi
            Junior Member
            • May 2016
            • 16

            #6
            I agree... I said it is completely reasonable to require proxies to have accurate time. My issue is specifically with active agents (whether they are communicating with a proxy or a server).

            Comment

            • dimir
              Zabbix developer
              • Apr 2011
              • 1080

              #7
              Your last line states the most important thing: Zabbix should not act like NTP, adjusting time. It is very challenging task and it should be done by a separate instance. We had long discussions about this topic and the decision was as follows: Zabbix should stop trying to do time adjustments because it creates more issues than solves. For the users that depend on accurate time - the only reasonable solution for them is to ensure system clock synchronization on all the involved hosts. Believe me, this is the best approach. I don't believe we'll ever bring this feature back because it appeared to be the wrong one.

              Comment

              • jangi
                Junior Member
                • May 2016
                • 16

                #8
                I'm not trying to beat a dead horse... I just feel like maybe we're still miscommunicating somehow.

                1) I don't see how recording a received time in a separate column and allowing users to choose what to pay attention to is acting like NTP at all.
                2) Imagine if a server logged invalid login attempts using the client time instead of it's time.. that would be crazy. Proxy->Server is one thing, but trusting client clocks... I don't know what else I can say. There should not be a difference in time-stamping values from an active client vs passive.

                If there is no way Zabbix will consider implementing this... then maybe I can suggest a different feature entirely:

                Create an item (maybe host.lastcontact?) that contains the timestamp of the last communication from the host. That way we can do a .nodata(5m) and get an accurate host offline trigger, regardless of client time. That's the biggest use case anyway.

                Thanks,
                James

                Comment

                • dimir
                  Zabbix developer
                  • Apr 2011
                  • 1080

                  #9
                  Right, looks like similar to what proxy has (zabbix[proxy,,lastaccess]). Sounds reasonable, I'd create such feature request. It shouldn't be hard to implement.

                  Comment

                  • jonybat
                    Junior Member
                    • May 2019
                    • 11

                    #10
                    Originally posted by dimir
                    Your last line states the most important thing: Zabbix should not act like NTP, adjusting time. It is very challenging task and it should be done by a separate instance. We had long discussions about this topic and the decision was as follows: Zabbix should stop trying to do time adjustments because it creates more issues than solves. For the users that depend on accurate time - the only reasonable solution for them is to ensure system clock synchronization on all the involved hosts. Believe me, this is the best approach. I don't believe we'll ever bring this feature back because it appeared to be the wrong one.
                    I'm not going to discuss whether or not this should have been done, because i don't know the real reasons behind it. From my point of view and my experience, changing the behavior created issues that didn't exist before.

                    I can live with setting agent ping to passive checks, because most of my environments allow it, but thats not the only problem. The real problem for me is that trusting active check's timestamp, creates other issues, if the timestamp is in the future:
                    - No other data from those active items will be logged until the server "catches up" with the wrong timestamp
                    - Data logging resumes if the agent on the device with the wrong timestamp is restarted after the clock has been corrected, but the "last check" timestamp in the server will remain in the future
                    - If any problem was generated from the values received in that first contact with the wrong timestamp, the events will stay active (even if in resolved state) until the server "catches up" with the wrong timestamp

                    I agree it is reasonable to require that the servers and proxies have accurate time. Asking that for hundreds or thousands of devices, especially when most of which dont have RTC, might be a different story.

                    Comment

                    Working...