Short:
The timestamp used for the clock field on the agent.ping item is the clock on the Zabbix Agent host, not the timestamp that it was received on the server. This is an issue because not all clients can be trusted to have an accurate clock. We monitor hundreds of 3rd party systems where we have no control over that. Several customers block NTP on their firewall. Clocks drift. Etc.
Long:
Ever since upgrading to Zabbix 4.0 I've had issues where some hosts start firing the unavailable trigger, constantly flapping between Problem and Ok. Until today it seemed totally random; sometimes it was hosts on older versions of the agent, sometimes with 4.0 agents, different OSes, etc. There is never anything in the agent log file to indicate an issue communicating with the server. I should point out that we use Active agents exclusively. Anyway, sick of the thousands of inaccurate emails that suddenly flood my inbox I was determined to find out why this was happening. After much debug log reading and digging around in mysql while data was coming in I realized that the clock field on the latest agent.ping item in the history_uint table was a timestamp about 5 minutes old, even though it had just arrived seconds earlier. Sure enough the system clock on the client machine was off by five minutes. Unfortunately NTP is not an option so I manually corrected the clock and hope it will stay.
I'm guessing this is because we're using active agents? I don't have any passive agents to compare to. I'm not sure why this issue seemed to start happening with the upgrade to 4.0; was the timestamp source changed?
The timestamp used for the clock field on the agent.ping item is the clock on the Zabbix Agent host, not the timestamp that it was received on the server. This is an issue because not all clients can be trusted to have an accurate clock. We monitor hundreds of 3rd party systems where we have no control over that. Several customers block NTP on their firewall. Clocks drift. Etc.
Long:
Ever since upgrading to Zabbix 4.0 I've had issues where some hosts start firing the unavailable trigger, constantly flapping between Problem and Ok. Until today it seemed totally random; sometimes it was hosts on older versions of the agent, sometimes with 4.0 agents, different OSes, etc. There is never anything in the agent log file to indicate an issue communicating with the server. I should point out that we use Active agents exclusively. Anyway, sick of the thousands of inaccurate emails that suddenly flood my inbox I was determined to find out why this was happening. After much debug log reading and digging around in mysql while data was coming in I realized that the clock field on the latest agent.ping item in the history_uint table was a timestamp about 5 minutes old, even though it had just arrived seconds earlier. Sure enough the system clock on the client machine was off by five minutes. Unfortunately NTP is not an option so I manually corrected the clock and hope it will stay.
I'm guessing this is because we're using active agents? I don't have any passive agents to compare to. I'm not sure why this issue seemed to start happening with the upgrade to 4.0; was the timestamp source changed?

(when proxy collects data from the agents it can be disconnected with server an proxy van hold up to 24h those data before it will start discarding oldest one).
Comment