Ad Widget

Collapse

Monitoring time offset on linux and windows clients

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Monitoring time offset on linux and windows clients

    We had some need to try and track the time skew/drift of servers from our master clocks. I'm sure there's still room for improvement but this is what we've done so far. I hadn't found any other discussion of this, particularly for Windows, so thought I'd share it.

    For the linux clients I created this user parameter:
    Code:
    UserParameter=ntp.client.offset,/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }'
    And for the Windows clients (this one was a lot trickier for me):
    Code:
    UserParameter=ntp.client.offset,powershell.exe -command "$timeoffset = &w32tm /stripchart /dataonly /computer:masterclock.local.domain /samples:1; $timeoffset[3].split(' ')[1].TrimEnd('s')"
    Then I created the matching items in the linux and windows zabbix templates, with type set to "Numeric (float)" and Units of "ms".

    Note that the Windows version returns the offset in seconds, not milliseconds, so I have a custom multiplier of 1000 in that item definition so that both platforms are tracking the same units. I also can't vouch for how new a version of Windows needs to be for this to work. I've only tested on Windows 7 and Server 2003 so far. I'm also planning to update the Windows UserParameter to let Zabbix dictate which master clock to check against, I just haven't gotten around to that yet.

    On the Linux side, I'm actually tracking a few additional statistics at the moment, though I haven't decided if they are practically useful. My thought is to have some triggers throw warnings if the number of peers drops too low or the stratum of the master peer goes too high. I'm also currently recording the IP of the currently selected peer more out of curiosity about how often some hosts might be flapping between different peers.

    Also note that currently for Linux hosts I'm making the check return 1000 if there is no current selected peer. I haven't made up my mind if I like that yet. I originally did it just so that during periods when a peer is being selected (such as just after a reboot) the check doesn't return as not supported. I may still change my mind about that and just make the check return nothing if there is no peer.

    Code:
    UserParameter=ntp.peer.stratum,/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { stratum=99 } $1 ~ /\*/ { stratum=$3 } END { print stratum }'
    
    UserParameter=ntp.peer.count,/usr/sbin/ntpq -pn | grep -E -c '^\*|^\+'
    
    UserParameter=ntp.peer.ipaddr,/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { peer=NONE } $1 ~ /\*/ { peer=$1 } END { print substr(peer,2) }'
    We've been running collecting this data on Linux for a couple weeks now and so far it seems reliable and accurate. The Windows collection just started a couple days ago and I'm still keeping an eye on those. It did immediately alert me to which Windows servers weren't using NTP to keep time though, as the drift on servers using default Windows Domain syncing was over a couple hundred milliseconds every few hours. It's already working much better than using fuzzytime though, which was frequently giving me false positives that I couldn't figure out. And it has the added bonus of letting me have graphs of the offsets over time.

    #2
    I use this trigger

    Clock skew too high on {HOST.HOST} {Template Windows:system.localtime.fuzzytime(600)}=0

    Comment


      #3
      Yes, the fuzzytime trigger is fine if you're not trying to keep high precision time. Many of our servers are involved in high frequency trading so a drift of even 1 second is too much for us.

      Also, fuzzytime compares the client to the zabbix server. Our preference is to track the offset between clients and the master time clocks. Even though the zabbix server is also syncing to the master clocks, it's just a linux server so is not as accurate as the clocks.

      Also the fuzzytime trigger doesn't let you keep historical charts of the offset measurements.

      Comment


        #4
        Collecting in ms from Windows

        I think I found a way to collect in Milliseconds from Windows so you don't need to have Zabbix modify the value (and you can use the same item for both Operating Systems):

        Code:
        UserParameter=ntp.client.offset,powershell.exe -command "$timeoffset = &w32tm /stripchart /dataonly /computer:master-time-server.fqdn /samples:1; [float]$timeoffset[3].split(' ')[1].TrimEnd('s').TrimStart('+') * 1000"
        At least it appears to be working on my system. I am neither a Zabbix expert nor a PowerShell expert, so YMMV.
        Last edited by StarDestroyer; 18-07-2014, 19:30.

        Comment


          #5
          Originally posted by mikesphar View Post
          Also the fuzzytime trigger doesn't let you keep historical charts of the offset measurements.
          Anyone knows a way of measuring the actual offsets? I mean, not triggering a problem above some threshold (fuzzytime), but recording the time offset values over time?

          Comment

          Announcement

          Collapse
          No announcement yet.
          Working...
          X