We had some need to try and track the time skew/drift of servers from our master clocks. I'm sure there's still room for improvement but this is what we've done so far. I hadn't found any other discussion of this, particularly for Windows, so thought I'd share it.
For the linux clients I created this user parameter:
And for the Windows clients (this one was a lot trickier for me):
Then I created the matching items in the linux and windows zabbix templates, with type set to "Numeric (float)" and Units of "ms".
Note that the Windows version returns the offset in seconds, not milliseconds, so I have a custom multiplier of 1000 in that item definition so that both platforms are tracking the same units. I also can't vouch for how new a version of Windows needs to be for this to work. I've only tested on Windows 7 and Server 2003 so far. I'm also planning to update the Windows UserParameter to let Zabbix dictate which master clock to check against, I just haven't gotten around to that yet.
On the Linux side, I'm actually tracking a few additional statistics at the moment, though I haven't decided if they are practically useful. My thought is to have some triggers throw warnings if the number of peers drops too low or the stratum of the master peer goes too high. I'm also currently recording the IP of the currently selected peer more out of curiosity about how often some hosts might be flapping between different peers.
Also note that currently for Linux hosts I'm making the check return 1000 if there is no current selected peer. I haven't made up my mind if I like that yet. I originally did it just so that during periods when a peer is being selected (such as just after a reboot) the check doesn't return as not supported. I may still change my mind about that and just make the check return nothing if there is no peer.
We've been running collecting this data on Linux for a couple weeks now and so far it seems reliable and accurate. The Windows collection just started a couple days ago and I'm still keeping an eye on those. It did immediately alert me to which Windows servers weren't using NTP to keep time though, as the drift on servers using default Windows Domain syncing was over a couple hundred milliseconds every few hours. It's already working much better than using fuzzytime though, which was frequently giving me false positives that I couldn't figure out. And it has the added bonus of letting me have graphs of the offsets over time.
For the linux clients I created this user parameter:
Code:
UserParameter=ntp.client.offset,/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }'
Code:
UserParameter=ntp.client.offset,powershell.exe -command "$timeoffset = &w32tm /stripchart /dataonly /computer:masterclock.local.domain /samples:1; $timeoffset[3].split(' ')[1].TrimEnd('s')"
Note that the Windows version returns the offset in seconds, not milliseconds, so I have a custom multiplier of 1000 in that item definition so that both platforms are tracking the same units. I also can't vouch for how new a version of Windows needs to be for this to work. I've only tested on Windows 7 and Server 2003 so far. I'm also planning to update the Windows UserParameter to let Zabbix dictate which master clock to check against, I just haven't gotten around to that yet.
On the Linux side, I'm actually tracking a few additional statistics at the moment, though I haven't decided if they are practically useful. My thought is to have some triggers throw warnings if the number of peers drops too low or the stratum of the master peer goes too high. I'm also currently recording the IP of the currently selected peer more out of curiosity about how often some hosts might be flapping between different peers.
Also note that currently for Linux hosts I'm making the check return 1000 if there is no current selected peer. I haven't made up my mind if I like that yet. I originally did it just so that during periods when a peer is being selected (such as just after a reboot) the check doesn't return as not supported. I may still change my mind about that and just make the check return nothing if there is no peer.
Code:
UserParameter=ntp.peer.stratum,/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { stratum=99 } $1 ~ /\*/ { stratum=$3 } END { print stratum }'
UserParameter=ntp.peer.count,/usr/sbin/ntpq -pn | grep -E -c '^\*|^\+'
UserParameter=ntp.peer.ipaddr,/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { peer=NONE } $1 ~ /\*/ { peer=$1 } END { print substr(peer,2) }'
Comment