Ad Widget

Collapse

Solaris 10 zabbix agent 'kstats' every second

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • troy
    Junior Member
    • Jul 2010
    • 1

    #1

    Solaris 10 zabbix agent 'kstats' every second

    zabbix_agentd 1.8.2 (collector process) appears to 'kstat' for CPU statistics every second:

    17413: open("/dev/kstat", O_RDONLY) = 6
    17413: ioctl(6, KSTAT_IOC_CHAIN_ID, 0x00000000) = 16180
    17413: ioctl(6, KSTAT_IOC_READ, "kstat_headers") Err#12 ENOMEM
    17413: ioctl(6, KSTAT_IOC_READ, "kstat_headers") = 16180
    17413: ioctl(6, KSTAT_IOC_READ, "cpu_stat0") = 16180
    17413: close(6) = 0

    This causes increased CPU usage.

    It occurs on all,:
    - The pre-compiled agent distributed with Zabbix for Solaris 10
    - The agent compiled using Sun Studio cc
    - The agent compiled using gcc

    any assistance on reducing this CPU usage would be appreciated.

    cheers.
  • trikke
    Senior Member
    • Aug 2007
    • 140

    #2
    Hi troy,

    did u get any assistance or solution for this?
    Having the same Problem and the engineers complaining that the Zabbix Agent (particulary on Solaris Zones) eating up their CPU!

    Greets
    Patrick

    Comment

    • untergeek
      Senior Member
      Zabbix Certified Specialist
      • Jun 2009
      • 512

      #3
      I could be wrong, but I think this was one of the things they fixed in 1.8.3

      Comment

      • trikke
        Senior Member
        • Aug 2007
        • 140

        #4
        Hi Troy,

        nope I'm on 1.8.3!
        Checked the source:
        Code:
        ZBX_THREAD_ENTRY(collector_thread, args)
        {
        	double	sec;
        
        	zabbix_log( LOG_LEVEL_INFORMATION, "zabbix_agentd collector started");
        
        	if (0 != init_cpu_collector(&(collector->cpus)))
        		close_cpu_collector(&(collector->cpus));
        
        	while(ZBX_IS_RUNNING())
        	{
        		sec = zbx_time();
        		if (CPU_COLLECTOR_STARTED(collector))
        			collect_cpustat(&(collector->cpus));
        #ifdef _WINDOWS
        		collect_perfstat();
        #endif /* _WINDOWS */
        
        		collect_stats_interfaces(&(collector->interfaces)); /* TODO */
        		collect_stats_diskdevices(&(collector->diskdevices)); /* TODO */
        #ifdef _AIX
        		collect_vmstat_data(&collector->vmstat);
        #endif
        
        		zbx_sleep(1);
        	}
        
        #ifdef _WINDOWS
        	close_perf_collector();
        #endif /* _WINDOWS */
        	if (CPU_COLLECTOR_STARTED(collector))
        		close_cpu_collector(&(collector->cpus));
        
        	zabbix_log( LOG_LEVEL_INFORMATION, "zabbix_agentd collector stopped");
        
        	ZBX_DO_EXIT();
        
        	zbx_thread_exit(0);
        }
        "zbx_sleep(1);" still there!

        Comment

        • Age42
          Junior Member
          • Aug 2008
          • 6

          #5
          We're seeing this behaviour on our systems too, and it's very apparent on the Niagara platform systems that have 128 cores on them, the zabbix agent is constantly doing kstat through all the cores:

          open("/dev/kstat", O_RDONLY) = 5
          ioctl(5, KSTAT_IOC_CHAIN_ID, 0x00000000) = 2536183
          ioctl(5, KSTAT_IOC_READ, "kstat_headers") Err#12 ENOMEM
          ioctl(5, KSTAT_IOC_READ, "kstat_headers") = 2536183
          ioctl(5, KSTAT_IOC_READ, "cpu_stat81") = 2536183
          close(5) = 0
          time() = 1290022738
          open("/dev/kstat", O_RDONLY) = 5
          ioctl(5, KSTAT_IOC_CHAIN_ID, 0x00000000) = 2536183
          ioctl(5, KSTAT_IOC_READ, "kstat_headers") Err#12 ENOMEM
          ioctl(5, KSTAT_IOC_READ, "kstat_headers") = 2536183
          ioctl(5, KSTAT_IOC_READ, "cpu_stat82") = 2536183
          close(5) = 0
          time() = 1290022738
          open("/dev/kstat", O_RDONLY) = 5
          ioctl(5, KSTAT_IOC_CHAIN_ID, 0x00000000) = 2536183
          ioctl(5, KSTAT_IOC_READ, "kstat_headers") Err#12 ENOMEM
          ioctl(5, KSTAT_IOC_READ, "kstat_headers") = 2536183
          ioctl(5, KSTAT_IOC_READ, "cpu_stat83") = 2536183
          close(5) = 0
          time() = 1290022738

          Comment

          • untergeek
            Senior Member
            Zabbix Certified Specialist
            • Jun 2009
            • 512

            #6
            I think this is a MAJOR problem now, or it's a different problem.

            We have lots of Solaris servers, being a primarily Solaris house. When I use the passive (not active) agent to run a check, the Zabbix server runs its query against the db and then runs the request directly to the agent. The agent responds and the value is recorded in the db. This works perfectly, though there tend to be delays when the server running the agent is heavily taxed.

            We have recently tried to switch to using an Active agent configuration. This works flawlessly…as long as the server running the agent is not heavily taxed.

            For some reason, requests can get severely delayed, even simply not responding. I have turned on Debug to 4 and watched and grepped and seen that items which I KNOW are set to send a new value every 30 seconds will go as long as 10 to 15 minutes between sending values. I do not know if this is a Solaris only thing, but I am inclined to believe it is.

            Until this can be remedied, all of my critical transactions will need to be passive.

            Comment

            • erikgreen
              Junior Member
              • Sep 2010
              • 9

              #7
              Same here

              Our Solaris 10 platforms with zones are using 2-4% cpu on the base hardware, which although it seems minimal to me is enough to irritate our users on those systems.

              We haven't finished buying our support contract yet, so I can't call up Zabbix and ask them for a fix. :\

              I'd suspect the Solaris agent needs a bit of re-coding.... anyone have a tweak for it?

              Erik

              Comment

              • ilikejam
                Junior Member
                • Oct 2008
                • 9

                #8
                Fixed?

                This is apparently fixed - don't know what version the fix will end up in, though.

                Comment

                • abruptlydisconn
                  Junior Member
                  • Mar 2010
                  • 4

                  #9
                  "Resolution: Unresolved"

                  That's not "fixed".

                  In general, why is support for Solaris so crappy? ZFS filesystems can't be monitored for how much free space they have. CPU iowait, system, and nice time aren't supported. Most of the memory stats aren't supported. The Solaris template includes checks for /vmlinuz. WTF?

                  Comment

                  • untergeek
                    Senior Member
                    Zabbix Certified Specialist
                    • Jun 2009
                    • 512

                    #10
                    I altered the C code and recompiled. I now have a binary that runs kstats every 3600 seconds, but I get 0 values when I try to collect CPU stats.

                    The trade off for now is that I have to collect CPU stats using UserParameters and scripts pulling the values from vmstat but at least the agent isn't abusing the CPU.

                    Comment

                    Working...