Ad Widget

Collapse

Checking for zabbix host time and the zabbix server time.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ericko
    Junior Member
    • Aug 2010
    • 5

    #1

    Checking for zabbix host time and the zabbix server time.

    Hey Guys,

    I currently have a trigger where if the time on one of my hosts is off by more than 5 seconds it triggers an alarm.

    I've accomplished it by using the following:
    {hostname:system.localtime.fuzzytime(5)}=0

    The item is checked once every 60 seconds.

    I was wondering if its possible to expand this trigger.

    So instead of alarming straightaway when the system.localtime is off by more than 5 seconds, I would like the item to be checked three times and failed three times for the alarm to be triggered.

    Is this possible to do?
  • pdwalker
    Senior Member
    • Dec 2005
    • 166

    #2
    Do you still need an answer to this?

    Comment

    • alexarean
      Junior Member
      • Jan 2012
      • 5

      #3
      I think yes mate

      Comment

      • pdwalker
        Senior Member
        • Dec 2005
        • 166

        #4
        Ok, me and my big mouth...

        (skip down to the part starting with ===> if you want to get to the final answer rather than how I meandered my way to the solution)

        Right, to do what you want you will need to use calculated items.

        Basically, a calculated item allows you to use the zabbix trigger functions to define items that can be stored in the database.

        So if I want to have a trigger that depends on the last three values of fuzzytime, I am going to have to create a calculated item to store those values so I can run my trigger against that data.

        So, I first tried define this item:

        Code:
        type: calculated
        formula: fuzzytime(system.localtime,X)
        where X is the parameter to fuzzytime, the time delta you'll accept as acceptable. The system will store the value (1 = ok, 0 = delta too large) from the fuzzytime function for system.localtime.

        In principle it should work, but for some reason, it does not work perfectly. Even on servers that were within a second of each other with a tolerance of 3 minutes, on a system that checked the time every minute, I'd get about a 40% failure rate. Obviously not ideal and would appear to be a bug, or a lack of understanding on my part.

        Attempt 2:

        Code:
        type: calculated
        formula: last(system.localtime)-last(<zabbix server name>:system.localtime)
        Type of Information: Numeric (float)
        This trigger will return the difference in unix time between the host system (first entry) and the second system (the zabbix server, or whatever server you decide should be the reference system) in seconds.

        Well, after letting over 200 servers collect data, I find out that this doesn't seem to work well either. I am getting time differences of upwards of +/- 300 seconds. Hmm..

        Ok, so two ways that should have worked have failed, so this needed some more investigation.

        It turns out that the server I was examining pulled the system time every 60 seconds, but most of my servers were pulling it every 5 minutes.

        Ah. I see the problem now. The above two functions use the last value pulled from the hosts, and with a five minute collection interval, there could be up to a 5 minute difference between the two, so that means my first calculated field would have worked if I had collected the data in a smaller time interval, or used a longer interval in my fuzzytime evaluation.

        ===> final answer

        Create a new calculated item with the following values (assuming I collect the system time every 60 seconds):

        Code:
        type: calculated
        key: system.localtime.fuzzytime
        formula: fuzzytime(system.localtime,65)
        Update Interval: 60
        Once I have this, I can create the trigger you ask for - to trigger an alarm if the times differ by too much for three consecutive checks.

        Code:
        {Zabbix Agent:system.localtime.fuzzytime.avg(3m)}<0.3
        Now, I've not tested this enough to know if this trigger is going to create lots of noise, but I might suggest making the trigger check for 4 failed checks over a 5 minute interval like so (assuming data collection every 60 seconds for system.localtime):

        Code:
        {Zabbix Agent:system.localtime.fuzzytime.last(5m)}<0.21
        As to your request for checking to see if the time is out more than 5 seconds, I cannot do that with this trigger as my resolution is only as fine as 60 seconds. e.g. in order to detect a problem, my servers have to be out of sync by more than 60 seconds in order to show up as a problem I chose 65 as my fuzzytime parameter because the errors in times could be up to 60 seconds based on when the two values are collected.

        If I want a 5 second interval, I'll need to collect the data every 5 seconds (and a fuzzy time of 10) which I think is too much of an overhead just for watching the clocks.

        Last thoughts

        Code:
        {Zabbix Agent:system.localtime.fuzzytime(5)}=0
        Using this as a trigger has worked well for me. I do not seem to be getting many false positives as all my servers run ntp.

        Anyway, enough rambling. Give this a try, or stick with the simple trigger and see how that works out for you.

        Have fun!

        Comment

        Working...