Ad Widget

**kloczek** · 24-04-2018, 21:44

First of all: permission of sampling time is limited to 1s in case of sampling time over zabbix agent. Why?
Because it is minimum resolution of agent items history period. Whatever you will do as long as long you will be using zabbix agent and zabbix agent active items item.
Second thing is that this precision worse in case of using passive (zabbix agent) items because initiation of the sampling data is done on the proxy/server and prx/srv poller communicates with agent, samples data and returns those data to server.
If you want higher precision of the sampling local time you can do this but you must do this using trapper items because in those items is possible to pass not only time in seconds but second part in nanoseconds.
As long as you will have sampled local time and time sampled from over (S)NTP you may do this subtraction to measure time skew.

However this approach still will be a bit over complicated because whoever is interested about proper (S)NTP time synchronization will be interested how much local time is shifted to some stratum source.
Another fact is as long as you have running process on monitored system which will be periodically checking and syncing time all what you really need to monitor is not monitoring time or time shift of local time to some (S)NTP server but monitor of this process .. is it running or not. This is simples approach to guarantee that time will be as much as it is only possible to shifted to loca (S)NTP setred time.
In other words you don't need to monitor time but process

If you want to measure this shift you may use ntp client to read this shift and save it in some zabbix item data.

**gessel** · 24-04-2018, 23:33

Hi Kloczek, this isn't quite what I'm looking for. A few problems can happen that make tracking the offset helpful:

- VMs have really, really bad time sync, massive drift. Windows' regular sync rate is far too infrequent. I use a script to check every hour.

- We're in a location where the outbound network is unreliable and can get routed in very different ways and occasionally is subject to some pretty gnarly flap. This makes sync to remote servers unreliable - we sync to a server on the LAN (as one would) and that syncs to global stratum. However, that server itself isn't perfectly reliable and... in fact... is running virtualized as well. That means it can drift and if the outside network goes out, drift rather meaingfully. When the NTP server does sync, the correction can be meaningful, and then when the local hosts sync to the now discontinuously corrected local NTP server, they become discontinuous with each other until all can sync.

- this can then create some pretty random errors in logins. The errors are on the servers that ended up being out of sync with the management server (generally). Knowing that the management server or NTP server had a string of failed syncs would predict coming problems, but not tell me which of the servers are out of sync, or indeed if they actually are.

- but if I can compare the clock of each host to the either Zabbix's clock (including the NTP server) or directly to the NTP server's clock, then I know which one is off and which one should be manually intervened with.

I don't need msec accuracy. A second or two works. But "too" much (poorly defined in the protocol) and problems arise.

And sure, using "fuzzytime(5)" is more or less sufficient to identify servers with problems, I can't track drift.

Thing is, the DB has "last check time" and the matching "local time" right there. I can write an external SQL query to return the difference with a little bit of fuss and head scratching, but the data is so tantalizingly at hand it would be lovely if there were a simple command to render it.

I suspect, given your history, that if there were such a thing you'd know about it and your answer suggests there is not so "system.localtime[utc].fuzzytime(5)=0" will have to suffice.

**kloczek** · 25-04-2018, 09:53

Originally posted by gessel

Hi Kloczek, this isn't quite what I'm looking for. A few problems can happen that make tracking the offset helpful:

- VMs have really, really bad time sync, massive drift. Windows' regular sync rate is far too infrequent. I use a script to check every hour.

In such cases always is possible to forward syscalls about reading current time guest systems to to host systems.
Depends on what kind virtualisation you are using details about how to do this are different.

With this you may not need to run in each guest system ntpd.
On Linux systems with systemd NTP client is now integrated even in systemd.

- We're in a location where the outbound network is unreliable and can get routed in very different ways and occasionally is subject to some pretty gnarly flap.

You can setup your local network local network NTP server and use it as reference.
Using GPS signal receivers you can build quite chep you own even stratum 3 time source

You can buy GPS USB dongle which can be used with your local ntp server really cheaply.

[..]

- this can then create some pretty random errors in logins. The errors are on the servers that ended up being out of sync with the management server (generally). Knowing that the management server or NTP server had a string of failed syncs would predict coming problems, but not tell me which of the servers are out of sync, or indeed if they actually are.

- but if I can compare the clock of each host to the either Zabbix's clock (including the NTP server) or directly to the NTP server's clock, then I know which one is off and which one should be manually intervened with

Nevertheless your job is not monitor time offset but to keep all systems with local time as close synced as it is only possible.
Delivery/informing about time offset is not your business task. Have synced well/correctly everything it is what you must care.
So again .. you should monitor those bits which are responsible for such synchronization.
You may additionally monitor time offset to confirm that generally synchronization on all systems works as it should but it is not necessary.
Using straight output of the ntpq/ntpstat command would be IMO easier/straightforward than sampling time on each system than using system.localtime[utc].fuzzytime in trigger.
Using fuzzytime() assumes that you time source is on zabbix server. It may not be useful always.

**gessel** · 25-04-2018, 11:59

Hi Kloczek,

Hmm... not sure it is quite appropriate to say what the job is. Tracking drift helps track down and potential resolve the cause. This is where data helps a lot. Zabbix is a great tool for collecting system data and this data has been essential in resolving quite a few vexing problems, some hardware, some software, some configuration. In this case, there is a hypothesis about login failures and having accurate clock drift data can help identify and resolve that issue. There's kind of this conceit that experts sometimes develop that they know the One True Path, when perhaps the full story is a little more nuanced.

Sure, I can write a script to ssh in and return local time on the various OSes in place, compare that directly with the NTP server, call that as an external script, etc. But the zabbix client already grabs that item out of all the OSes we're using and all manner of different OSes can be tracked in a unified manner with very little effort.

If you search on how to track clock skew between servers, you'll see this is not a unique request and drift on VMs is a well known problem. So while I appreciate the suggestions for mechanisms to verify that the NTP processes are running, or running a GPS time sync (we do have one on the network, BTW, but it is isolated to another time-of-flight correlation task with much, much higher accuracy requirements than +/- a few seconds), I'm asking for advice on how to solve a particular problem: get +/- 1 second (ish) accurate clock skew data stored as an item in the database. I suspect there are others that would find this useful. The usual answer is to use "fuzzytime" which is pretty close. It seems the internal comparison fuzzytime is doing and returning 1/0 is actually sufficient for this task. If fuzzytime could be called as a calculated result and had a units parameter like (seconds,s) where "seconds" is an integer test that returns the current binary output and "s" would return the floating point difference, that'd totally solve my need.

Again, thanks for the advice. The answer is no different (so far) than others provided: trigger using fuzzytime and get a binary "problem/OK" output to within a specified absolute value delta, remember that internally the comparison is +/1 one second. If you need numerical drift data or more accuracy, the following links may be helpful (though having Windows machines, as always, complicates issues).

https://unix.stackexchange.com/quest...ly-submit-jobs
https://superuser.com/questions/4087...-linux-servers

**kloczek** · 25-04-2018, 13:00

Originally posted by gessel

Hi Kloczek,

Hmm... not sure it is quite appropriate to say what the job is. Tracking drift helps track down and potential resolve the cause.

Yes. That is obvious. Issue only is that you are doing this using zabbix server time as reference. Usually systems are syncing time over (S)NTP so local (S)NTP server time should be used as reference.
As long as your (S)NTP server is not on the same system where is running zabbix server you are adding additional error to all those systems which are measuring.

Sure, I can write a script to ssh in and return local time on the various OSes in place, compare that directly with the NTP server, call that as an external script, etc. But the zabbix client already grabs that item out of all the OSes we're using and all manner of different OSes can be tracked in a unified manner with very little effort.

Executing ntpq and extracting exact part of the output (using sed for example) does not need to be in the script. It can be done in short oneliner.
You can use system.run[] key to execute such oneliner without writing and spreading any scripts across all systems.
As long as you have already in agent settings EnableRemoteCommands=1 you can setup whole time NTP drift monitoring changing only zabbix monitoring configuration.

Example. Long time ago I've been asked to add monitoring network retransmissions so I've added in my OS Linux template:

Name
NET::segments retransmitted
Type
Zabbix agent (active)
Key
system.run["/bin/netstat -s|/bin/sed -n 's/$ *$$.*$ segments retransmitted*/\2/ p'"]
Type of information
Numeric (float)

(it is as float because in "processing" this item has "Change per second")

**gessel** · 25-04-2018, 16:26

OMG, that's awesome. I hadn't messed with active triggers yet.

I agree that using the zabbix host as a reference is slightly suboptimal. But I can set a trigger at a higher priority, say, if that host drifts.

here's the windows command:
>w32tm /stripchart /dataonly /samples:1 /computer:10.100.50.150 (fill in as appropriate)
Tracking 10.100.50.150 [10.100.50.150:123].
Collecting 1 samples.
The current time is 4/25/2018 4:14:35 PM.
16:14:35, +05.3312011s

for Linux
$ ntpq -p
remote refid st t when poll reach delay offset jitter
================================================== ============================
+pfSense.pmcam 62.201.225.9 3 u 729 1024 377 0.135 2.705 1.356
*time.iqnet.com 62.201.214.162 2 u 119 1024 357 6.629 -0.041 7.755
+node62.ia64.org 105.100.222.35 2 u 244 1024 377 185.534 8.559 15.684

executing the one line script on the remote host to extract the drift is cool on Linux/BSD machines, but harder on windows (especially Win7, server might have some utilities).

For Linux/BSD hosts though, one can $ sudo apt-get install iputils-clockdiff and then
$ clockdiff 10.100.50.150
.
host=10.100.50.150 rtt=750(187)ms/0ms delta=1ms/1ms Wed Apr 25 17:10:35 2018

from the zabbix host (obviously using the zabbix host as a reference). This doesn't work against windows machines though.

Seems doable...

**kloczek** · 25-04-2018, 18:26

Originally posted by gessel

executing the one line script on the remote host to extract the drift is cool on Linux/BSD machines, but harder on windows (especially Win7, server might have some utilities).

I'm 100% sure that you will find package in cygwin with ntpq so even opn Win the same method can be used.

**kloczek** · 25-04-2018, 18:34

Originally posted by gessel

OMG, that's awesome. I hadn't messed with active triggers yet.

system.run[] key it is the agent key ... doesn't matter active or passive one.
https://www.zabbix.com/documentation...s/zabbix_agent
It is only "coincidence" that I'm using only active agents setup

**gessel** · 25-04-2018, 23:01

Oh, you're right of course, I was kinda hoping to avoid doing anything special on the clients, but cygwin is a viable solution. Meinberg seems to have some nice tools too. I'll try with the simple fuzzytime check for now. If it is sufficient, I'm OK with it.

Ad Widget

Graphing Clock Drift: how to compare system.localtime to server time? (!fuzzytime)

Graphing Clock Drift: how to compare system.localtime to server time? (!fuzzytime)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment