Ad Widget

Collapse

system.cpu.load != cpu utilization?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • uid0
    Junior Member
    • Aug 2013
    • 8

    #1

    system.cpu.load != cpu utilization?

    Hello,

    the value system.cpu.load is the system load average, correct?

    But a high load does not always mean that the server is overloaded and as i see zabbix does not have a standard trigger for cpu utilization. Or does i miss something?

    To monitor CPU utilization i would need to make follow caclulation:
    cpu idle plus cpu io wait = % of free cpu util

    I think it must be something like to get the real idle usage:
    Code:
    {Template OS Linux:system.cpu.util[,idle].last(0) + system.cpu.util[,iowait].last(0)}>20
    From the calculated idle usage we know the correct cpu usage. The value 20 would mean that there is only 20% CPU free. Does somebody know how can i reverse this? So that there is shown 80% used instand of 20% free? I am not a math genius

    Is this correct? Does maybe somebody can provide the correct trigger expression to monitor the real CPU utilization?

    Thanks a lot!
  • uid0
    Junior Member
    • Aug 2013
    • 8

    #2
    push, sorry!

    Comment

    • natalia
      Senior Member
      • Apr 2013
      • 159

      #3
      Originally posted by uid0
      Does maybe somebody can provide the correct trigger expression to monitor the real CPU utilization?
      I am using :

      trigger name : CPU utilization > 95% for 4 mins on {HOST.NAME}

      ({TRIGGER.VALUE}=0 & {Template OS Linux:system.cpu.util[,user].min(4m)}>95) | ({TRIGGER.VALUE}=1 & {Template OS Linux:system.cpu.util[,user].min(4m)}>80)

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        Originally posted by uid0
        But a high load does not always mean that the server is overloaded and as i see zabbix does not have a standard trigger for cpu utilization. Or does i miss something?
        CPU utilization has nothing to do with CPU load.
        CPU load it is length of the system running queue.
        Current CPU load is displayed in "r" column of vmstat command output and it is integer value.
        loadavg{1,5,15} are average values of length ot this queue in last {1,5,15} minutes, and because these values are average values this is why these numbers are float values.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • karmukis
          Member
          • Aug 2014
          • 37

          #5
          I know this an old poste but I'm having the following problem, and please bare with me, and sort of newbie on zabbix...

          I want to get the CPU LOAD on PORCENTAGE... and is you use "system.cpu.load", it gives you the average data, and what I want is that but on porcentage,....

          What can I use to get this data, on porcentage? like when you execute an "htop" on the serve.

          thank you so much

          karmukis

          Comment

          • jan.garaj
            Senior Member
            Zabbix Certified Specialist
            • Jan 2010
            • 506

            #6
            You want cpu utilization, not cpu load, if you want a percent.
            Check item key - system.cpu.util in manual
            Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
            My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

            Comment

            • bbrendon
              Senior Member
              • Sep 2005
              • 870

              #7
              Telling someone what they want? That's not nice. You can get CPU load as a percentage. The basic idea is to take the load average divided by the # of CPUs.

              There are a few ways to do that in zabbix. Probably the easiest is to combine two items (one with load avg and one with # of CPUs) to calculate a 3rd item which would be the percentage.

              I can think of a few better ways (IMHO), but they would be much more involved.
              Unofficial Zabbix Expert
              Blog, Corporate Site

              Comment

              • jan.garaj
                Senior Member
                Zabbix Certified Specialist
                • Jan 2010
                • 506

                #8
                User first

                Let's go to fight :-P. I've made a custom implementation for cpu usage monitoring, so I feel very confidently. But I'm open for new knowledge.

                Also Zabbix manual refers to wiki:

                In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs.
                How can you express it on %? What is 100%? I have never ever seen any monitoring system, which shows load on %.

                What can I use to get this data, on porcentage? like when you execute an "htop" on the serve.
                What is on percentage in htop? man htop -> processor usage

                => user want to see CPU usage (utilization)
                He doesn't understand about load, utilization, but he mentioned example with htop. That's the main reason why I have recommended to use system.cpu.util - I listen user ;-)

                BTW1: Actually zabbix agent is not able to provide the same value as htop, only avg1/avg5/avg15 values are available.
                BTW2:
                The basic idea is to take the load average divided by the # of CPUs.
                zabbix has item key - system.cpu.load, but instead of parameter all, use percpu (total load divided by online CPU count)
                Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                Comment

                • karmukis
                  Member
                  • Aug 2014
                  • 37

                  #9
                  jajajaja
                  ok, I'm really happy for all the help I'm getting, but, going back to "monitoring side"
                  Please give me some time to check all the data you send me and try to figure this out.

                  The idea is that our servers, because of the amount of request, and the consecuent transactions, our CPUs utilization scalates to dangerous level....

                  So far I have configure SYSTEM.CPU.LOAD to warn us when the check an "all" cpu is above 5, because when using the "percpu" I'm not sure what value is the one it takes to compare against the warning value.... I check it, and when my CPU load was on 5.... zabbix was telling me that it was under normal values, so then, I executed HTOP, and saw that, from the 8 CPUs, some the CPU usage was on 100%, some at 80]% and maybe 3 of them were on 20% of cpu usage... so that why I'm having such a headache!

                  This is how the trigger looks like right now.

                  {requester-a1:system.cpu.load[all,avg1].last(0)}>5

                  again, thank you so much.

                  Comment

                  • jan.garaj
                    Senior Member
                    Zabbix Certified Specialist
                    • Jan 2010
                    • 506

                    #10
                    IMHO the best trigger for you is
                    {requester-a1:system.cpu.load[percpu,avg1].last(0)}>1

                    Critical value for cpu.load[all,avg1] depends on number of online CPUs. I have some servers with >20, but it has 24 CPUs so "normalized" cpu load (percpu) is 20/24=0,8 so I don't have problem. It'll be problem if I have load 20 on 1 CPU device.

                    You are safe if you have load 5 and you have 8 CPUs. Also your CPU utilization is not 100% for all your CPU. I don't see any problem with your CPU (metric load, utilization), so what is your problem?
                    Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                    My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                    Comment

                    • bbrendon
                      Senior Member
                      • Sep 2005
                      • 870

                      #11
                      Wow. I've been using CPU load as a percentage for ages. I never knew zabbix added percpu on the load item! Wow.
                      Unofficial Zabbix Expert
                      Blog, Corporate Site

                      Comment

                      • karmukis
                        Member
                        • Aug 2014
                        • 37

                        #12
                        Ok,
                        first of all... I'm not a he, I'm a she...
                        Then, I leave the cpu-load alarm working all weekend, and I'm happy to say it seems to be working fine, but the values I have set seems to be quite low because I kept sending alarms about CPU load problems on the server I'm testing....
                        Checking the server itself, its normal values for the "load average" are around 1.95, 2, 3....
                        and very often the load scalates to 5 or 6 ...
                        So keep testing.
                        I need to check ho to aply different values for this alarm, because, the requeste load on the servers is not very well distributed (thanks to amazon ELB), and sometimes, we have 5 server doing nothing, and 7 crying for help....

                        Ok, again,thanks for everything, I'll keep you posted.
                        karmukis >> karina

                        Comment

                        • jan.garaj
                          Senior Member
                          Zabbix Certified Specialist
                          • Jan 2010
                          • 506

                          #13
                          OK Karina.

                          AWS is out of scope, but my notes:
                          - ELB - do you use sticky session?
                          - AWS - check also Cloudwatch CPU metrics; usually Zabbix CPU metrics are not the same as Cloudwatch CPU metrics (hypervisor vs OS)
                          - check steal CPU usage time - for example: your VM can consume 100% CPU time, but actually 90% is "only" steal CPU time
                          Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                          My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                          Comment

                          Working...