Ad Widget

Collapse

system.cpu.util on solaris

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bdo
    Junior Member
    • Oct 2005
    • 15

    #1

    system.cpu.util on solaris

    Hi,

    I'm using zabbix 1.1 agent on several Solaris 8 server ( single and multiple processor ) and I ve got a strange result when getting system.cpu.util items :

    Zabbix agent sends value for cpu activity but changes between 2 values ( actual and the last ) is not the real cpu activity : the delta between 2 value is about +/-0.001 % even if real cpu delta on the solaris is about 10 ou 20 % !

    No special information in zabbix_agentd.log :

    .../...
    006136:20060608:123700 In check_security()
    006136:20060608:123700 Connection from [192.168.1.1]. Allowed servers [192.168.1.1]
    006136:20060608:123700 Before read()
    006136:20060608:123700 After read() 2 [23]
    006136:20060608:123700 Got line:system.cpu.util[,idle]
    006136:20060608:123700 Sending back:94.923921
    006138:20060608:123714 Sleeping for 60 seconds
    .../...
    006136:20060608:123800 Got line:system.cpu.util[,idle]
    006136:20060608:123800 Sending back:94.924001
    .../...
    006136:20060608:123900 Got line:system.cpu.util[,idle]
    006136:20060608:123901 Sending back:94.924314
    .../...
    006136:20060608:124000 Got line:system.cpu.util[,idle]
    006136:20060608:124000 Sending back:94.924397
    .../...
    006136:20060608:124100 Got line:system.cpu.util[,idle]
    006136:20060608:124100 Sending back:94.924657
    .../...

    Item is defined as bellow :
    Item 'Sol8:CPU Idle'
    Description : CPU Idle
    Type : ZabbixAgent
    Key : system.cpu.util[,idle]
    Type of information : Numeric (float)
    Units : %
    Use Multiplier : do not use
    Update interval : 60

    Of course, I've got the same kind of problem with kernel and wait mode

    Does someone have the same problem ?
    Is there something else I can try or test ?

    Thanks a lot
    --bdo
  • Aaron
    Junior Member
    • May 2006
    • 16

    #2
    I have a similar, but different problem on my Solaris 9 servers. My idle,user,kernel,wait data keeps increasing slightly each time it is sent. If I look at a 1 day graph, all 4 appear to be straigh lines. If I check the 1 week graph, I can see they all angle up slightly. The data I get from zabbix seems to have no realtion to the actual usage of the server.

    I have been searching the forums for posts on system.cou.util and it seems there are several questions, but very few answers.

    Comment

    • Aaron
      Junior Member
      • May 2006
      • 16

      #3
      I am pretty sure I know what the problem with this is. I don't know how zabbix collects this data, but it appears to be the same data you get from mpstat. mpstat returns this data as an average from the time of the last reboot. Basically, the longer your server has been up, the less this data changes.

      I haven't written C code since 1998 so I did not even attempt to verify this in the code. The numerical corelation seems to be spot on though so this is probably correct.

      I was toying with a UserParameter hack for this, but it seems like the sort of thing that really should be fixed in zabbix. The data the agent is collecting seems usless in my opinion.

      The hack I am writing is basically a cron that collects mpstat data and dumps it to a /tmp file that zabbix can just cat/sed/awk quickly. I haven't worked out the period of the collection or the check. It seems like there should be a better way, but I haven't thought of one yet.

      Comment

      • art
        Junior Member
        • Oct 2006
        • 12

        #4
        Had a quick look at zabbix-1.1.2/src/libs/zbxsysinfo/solaris/cpu.c

        Looks like current SYSTEM_CPU_UTIL function uses kstat system calls which returns cpu utilisation counters since last reboot(?) - therefore zabbix shows average cpu utilisation for quite long period of time instead of current state of things. Though I'm not a C programmer (well, not programmer at all) I patched cpu.c so it now returns more meaningful data ( more or less the same as sar does).

        Again, I'm not programmer and possibly there is more correct way to work with kstat data but it works for us ( tested on Solaris9 ).

        Patch:


        79a80,81
        > unsigned long long cpu_val_s[4];
        > unsigned long long cpu_val_e[4];
        140c142
        < if (get_cpu_data(&cpu_val[CPU_I], &cpu_val[CPU_K], &cpu_val[CPU_U], &cpu_val[CPU_W]))
        ---
        > if (get_cpu_data(&cpu_val_s[CPU_I], &cpu_val_s[CPU_K], &cpu_val_s[CPU_U], &cpu_val_s[CPU_W]))
        141a144,149
        > sleep(5);
        > int q = get_cpu_data(&cpu_val_e[CPU_I], &cpu_val_e[CPU_K], &cpu_val_e[CPU_U], &cpu_val_e[CPU_W]);
        > cpu_val[CPU_I] = cpu_val_e[CPU_I] - cpu_val_s[CPU_I];
        > cpu_val[CPU_K] = cpu_val_e[CPU_K] - cpu_val_s[CPU_K];
        > cpu_val[CPU_U] = cpu_val_e[CPU_U] - cpu_val_s[CPU_U];
        > cpu_val[CPU_W] = cpu_val_e[CPU_W] - cpu_val_s[CPU_W];

        Comment

        Working...