Ad Widget

Collapse

Aggegate checks - unreliable data output

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • wdk
    Junior Member
    • Jun 2018
    • 5

    #1

    Aggegate checks - unreliable data output

    Hi all,

    I'm running into a problem with aggregate items not reliably showing the requested data.
    The Zabbix server is at version 3.0.4 and running on CentOS 7.3.1611

    I have a group of Citrix and VMware View servers of which I need to show a graph with total active sessions for each solution.
    The servers are grouped in a Citrix host group and a VMware host group. All servers have a template "RDP" attached which gathers the current amount of active sessions for each serer. This is done with the zabbix agent with the following item key:
    Code:
    perf_counter["\Terminal Services\Active Sessions"]
    I then created a Zabbix-aggregate with the following keys for the Citrix and View servers:
    Code:
    grpsum["Citrix servers","perf_counter[\"\Terminal Services\Active Sessions\"]",last,0]
    grpsum["View RDS Servers","perf_counter[\"\Terminal Services\Active Sessions\"]",last,0]
    In the zabbix server logs I can see that servers flip between supported and not supported with the following error message:
    Code:
    12028:20180612:113703.354 item "HOSTNAME1_REDACTED:grpsum["View RDS Servers","perf_counter[\"\Terminal Services\Active Sessions\"]",last,0]" became not supported: No items for key "perf_counter["\Terminal Services\Active Sessions"]" in group(s) "View RDS Servers"
     12027:20180612:115218.484 item "HOSTNAME2_REDACTED:grpsum["View RDS Servers","perf_counter[\"\Terminal Services\Active Sessions\"]",last,0]" became supported
    Servers become supported randomly over time and stop again after a while. There are no changes done in the mean time and there seems to be no reason or cause of the change in state.
    The result is broken graphs across the hosts. Some graphs show no aggregate data, other show the aggregate data for a few minutes and then stop.
    Attached is a screenshot of the same aggregate data of 2 different hosts, showing the random start/stop of data generation/presentation.
    Attached Files
  • aigars.kadikis
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2018
    • 208

    #2
    Hello,

    Please provide the graph for Zabbix internal process health for the 1 day period.
    Monitoring -> Graphs
    Choose [Zabbix server]
    and select [Zabbix data gathering process busy]

    Comment

    • aigars.kadikis
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Mar 2018
      • 208

      #3
      Are you using item type [Zabbix agent] or [Zabbix agent active] to collect metrics from performance counter?
      Please use Zabbix agent-active if not configured already.

      Comment

      • wdk
        Junior Member
        • Jun 2018
        • 5

        #4
        Attached is the graph for 1 day. It seems there are quite a large amount of unreachable pollers.
        The perf counter is using the zabbix-agent for data collection not zabbix-agent active. I assume I can change this safely to active?
        Attached Files

        Comment

        • aigars.kadikis
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Mar 2018
          • 208

          #5
          Yes, you can safely change the item to Zabbix agent active. This will improve performance a lot. Please share a graph after you migrate to active checks and let the items gather the metrics for few hours.

          Comment

          • wdk
            Junior Member
            • Jun 2018
            • 5

            #6
            Changing to active resulted in no more data coming in so we're looking into that first. Is the active agent mandatory to get a reliable grpsum graph or is it just beneficial?

            Comment

            • aigars.kadikis
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Mar 2018
              • 208

              #7
              When using active checks then the 'Host name' in host configuration must always match 1:1 with the 'Hostname=' in the zabbix_agentd.conf on the host where you collect the metrics:
              Click image for larger version

Name:	active-mode.png
Views:	290
Size:	164.6 KB
ID:	360450

              Please set the same Hostname for both sides, restart agent and the metrics will come.

              Regards,

              Comment

              • aigars.kadikis
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Mar 2018
                • 208

                #8
                The active agent is not mandatory. It is the recommended way to gather the metrics.
                Active agent has a benefit that it can still continue to collect the metrics if the machine has some network outages. It will push the metrics to server when it will be reachable again.
                Change 'BufferSize=' parameter in zabbix_agentd.conf if you want to max out this cache.

                Regards,

                Comment

                • wdk
                  Junior Member
                  • Jun 2018
                  • 5

                  #9
                  Yesterday evening the zabbix-server service was restarted and since then the graphs are plotting correctly. I'll have to dig into the logs but for now it seems to be working fine. This was before switching hosts to the active agent.
                  Attached Files

                  Comment

                  Working...