Ad Widget

Collapse

Fuzzytime() false alerts

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mushero
    Senior Member
    • May 2010
    • 101

    #1

    Fuzzytime() false alerts

    Zabbix 1.8.3, thousands of hosts, no proxy involved in this issue.

    We use fuzzytime() in our core template for all servers. Works well, this trigger: {srv-xxx-php4:system.localtime.fuzzytime(120)}=0

    Now, we added a new customer, 30 servers and most are getting false alerts on this trigger, always for 5 minutes (our item refresh time).

    Can't be zabbix time itself or we'd have errors on all hosts (thousands).

    So it's feeling like it gets bad data for 1 cycle, then okay. But checking data for this, it all looks wonderful, see attached image - every 299-302 seconds we get new data, and it's always 299-302 seconds different, as it should be. Nothing missing, graphs over time show no non-linearity, etc.

    System triggered at 00:02, the highlighted data item, so the question is why ?

    Timestamp Value
    2013.Aug.20 00:37:50 1376930270
    2013.Aug.20 00:32:49 1376929969
    2013.Aug.20 00:27:52 1376929672
    2013.Aug.20 00:22:50 1376929370
    2013.Aug.20 00:17:50 1376929070
    2013.Aug.20 00:12:52 1376928772
    2013.Aug.20 00:07:51 1376928471 <- OK here
    2013.Aug.20 00:02:53 1376928173 <- Alerted here
    2013.Aug.19 23:57:52 1376927872 <- OK here
    2013.Aug.19 23:52:51 1376927571 <- Alerted here
    2013.Aug.19 23:47:50 1376927270
    2013.Aug.19 23:42:58 1376926978
    2013.Aug.19 23:37:53 1376926673

    Thanks !

    (Also, seems I can't upload attachments as over quota, but I only have 96K of attachments in all time).
  • Heilig
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2013
    • 366

    #2
    Servers monitored through zabbix proxy or directly?

    You need to understand how and check what compares zabbix. For the first, you can use the documentation and https://support.zabbix.com/browse/ZBX-4500. For a second, you can use macros {ITEM.VALUE<1-9>} and {TIME} (https://www.zabbix.com/documentation...ed_by_location), and then compare the values ​​to find out, there really is a difference or not.

    Comment

    • mushero
      Senior Member
      • May 2010
      • 101

      #3
      Originally posted by Heilig
      Servers monitored through zabbix proxy or directly?
      No proxy.

      You need to understand how and check what compares zabbix. For the first, you can use the documentation and https://support.zabbix.com/browse/ZBX-4500. For a second, you can use macros {ITEM.VALUE<1-9>} and {TIME} (https://www.zabbix.com/documentation...ed_by_location), and then compare the values ​​to find out, there really is a difference or not.
      Yes, familiar with that feature, which is why I show the lists that includes the times Zabbix received the data in Latest Data - correct me if I'm wrong, but the timestamp is when Zabbix receives it and the trigger checks the value - you can see they never differ by more than a couple seconds, yet we have 120 second trigger alerting.

      And only on one set of customers/servers, not any others, which is odd. We've seen a few issues on fuzzytime but I think related to proxies or other issues, but is consistently every day or two on these servers, so very odd.

      Comment

      • Heilig
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Mar 2013
        • 366

        #4
        Interesting. )
        The problem happens on the same servers in the same time?
        Note the phrase of ZBX-450:
        "... AT THE TIME TRIGGERS ARE PROCESSED".
        Check the graphics from "Template App Zabbix Server", perhaps they have bursts at the same time.

        Comment

        • mushero
          Senior Member
          • May 2010
          • 101

          #5
          But isn't the time processed when it's received, or very close to it ? I can't imagine the server is more than 120 seconds late processing this data when the received time in the DB is exact.

          And if it was very busy or far off, we'd see this on all servers, not this customer.

          Comment

          • Heilig
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Mar 2013
            • 366

            #6
            Yes you are right. Do you use notification? Can you spend a little time for testing? Include two macro {ITEM.VALUE} and {TIME} in the email message which generates by this trigger.

            Comment

            • mushero
              Senior Member
              • May 2010
              • 101

              #7
              Originally posted by Heilig
              Yes you are right. Do you use notification? Can you spend a little time for testing? Include two macro {ITEM.VALUE} and {TIME} in the email message which generates by this trigger.
              We don't, but we can try this - this problem has happened less lately and may be due to server very busy, but this is a good idea to notify on it to see the real issue (though I'd think we could see in the event). Thanks.

              Comment

              • mushero
                Senior Member
                • May 2010
                • 101

                #8
                Seems I may understand why.

                Can someone tell me WHEN triggers are evaluated ? We always think it's when values change, but where does it happen ? You'd think it's when new values arrive at the Zabbix server but I'm now thinking this is not true.

                Looking at the code, it seems the poller process just store the item data in the cache and the cache sync/flush process actually evaluates the functions and sets trigger values.

                In our case and nearly 100,000 triggers and 600 NVPS I think this process is getting behind, especially during housekeeping, and thus my fuzzytimes are getting delayed and thus the trigger fires.

                Am I wrong ? Can someone tell me how it works for real, i.e. the path from receiving agent data and evaluating a trigger with fuzzytime?

                Comment

                Working...