Ad Widget

Collapse

Zabbix aggregate checks displaying old data -- am I doing it wrong?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Stephen Wood
    Member
    • Feb 2012
    • 43

    #1

    Zabbix aggregate checks displaying old data -- am I doing it wrong?

    I have many aggregate checks, like this example:

    grpavg["linux","system.cpu.load[,avg1]","last","0"]

    This item is on a pseudo host called "Aggregate". It successfully displays an average of every host in the group. However, if a host becomes unavailable or goes offline, that hosts last value will be continue to be displayed in the aggregate check. This is particularly bad because usually a host goes offline because its load was very high.

    Since one of the host's last values was high and that host is no longer online, the average for the aggregate is still artificially high.

    Is there a way to not include stale data from hosts that are offline? Unfortunately because of a limitation with Discovery Rules I can't add a rule to automatically remove the host since it's not on a subnet I can scan.
  • safpsr
    Member
    • Aug 2007
    • 70

    #2
    Hi,

    I have exactly the same problem.
    Is this normal? Is there a solution to solve this problem?

    Thanks for you help

    Comment

    • Stephen Wood
      Member
      • Feb 2012
      • 43

      #3
      As far as I know there is not a solution. This was very frustrating to me, and was a huge hinderance to rolling out Zabbix in a cloud environment. Zabbix works great for static hosts, but the minute you start getting a revolving host group things get sucky.

      Here's how I handled it: for all checks that rely on aggregation I instead use the zabbix_sender agent to send the data. Each hosts pings this info to a single host check. On the graphs, I just display the values received as a sum in the last N seconds. This way the data from old, dead hosts isn't displayed as live data.

      If you would like more help doing this I can give directions.

      Comment

      • safpsr
        Member
        • Aug 2007
        • 70

        #4
        Thank you for your answer.

        If I understand correctly, for example if you have 3 groups, there are 3 items. Each host sends with zabbix_sender a value with the name of the item of its group. This means that there are several values ​​(one per host) in the same period (60 sec.) on one item. How do you to display the sum of all these values ​​for a period in a graph?

        Thank you.

        Comment

        • Stephen Wood
          Member
          • Feb 2012
          • 43

          #5
          You create a separate item as a "calculated" type, and set it as a sum of every point within 60 seconds. That's the item you use to graph all of your aggregate data.

          Comment

          • nicolasgoudard
            Junior Member
            • Mar 2021
            • 27

            #6
            UP, 9 years later, I have the same problem. I have multiple hosts in a cluster group
            If the host is not available (down), the aggregated calculated time returns an absurd value, because for zabbix the "last" means the last time the value was checked by zabbix. But normally for a given timestamp; if the value has not been verified, it should return 0

            grpsum ["cluster", system.cpu.num, "last", 0]
            for example here, if a host from the "cluster" group has not been online since 8:00 a.m., the cpu count returned at 11:00 a.m. should be zero and not 32 which was the valid CPU count before 8:00 a.m. But the problem is that I get 32, because the last value checked by zabbix was 32 at 7h59.

            Can I achieve this in zabbix or do I have to do an external script: ssh loop on all machines in the cluster, then sum the processors then send it with zabbix_sender then fetch this value with zabbix_trapper))?

            Comment

            Working...