Announcement

Collapse
No announcement yet.

cpu.load to high on Windows

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    cpu.load to high on Windows

    I'm running Windows Server 2008 R2 as VM on Citrix XenServer 5.6 SP2. For monitoring the Server i use
    Zabbix 1.8.8 and the Zabbix-Windows-Agent 1.8.8 64-bit.

    The problem is that the system.cpu.load[,avg1] is too high. The Template is configured to trigger an alarm when the load goes
    over 5. Well, the MS Exchanger Server 2010 goes several times a day over 5, till 25 sometimes. That's not normal, isn't it?


    Could it be that Windows sends wrong or multiplied data?
    Attached Files

    #2
    Are you supposed to be using system.cpu.load ?

    For my windows servers I use

    system.cpu.util[,,avg1]

    Comment


      #3
      Well, the original template uses the cpu load. So I thought it would be the
      best to use it as zabbix released it.

      I will try to use cpu.util the next days.

      I am still interested to know why the cpu load is so high on windows systems.

      Thanks for your hint.

      Comment


        #4
        Hi.
        Cpu load is the number of threads in a runnable state on the system.
        on the windows agent, it's mesured using an averaged "processor queue length" perfmon counter.

        normaly on a uniproc it's supposed to be blow 1 with higher spikes, but I've seen very high numbers on some systems with very low real CPU usage , specificaly on VMs...

        On linux, it includes processes waiting for disk, and I guess it's the same for windows.

        So all in all it's not a good CPU usage indicator, more an overall system health indicator ...

        Ghoz

        Comment


          #5
          Hi,

          I have exactly the same issue on several Windows servers, especially those running SQL Server (which is very CPU/Mem intensive). The strange factor is that those triggers firing up only started after I've upgraded Zabbix from 1.8.13 to 2.0.0!!!



          Any solution for this issue or have you just changed the item "Processor load" to "Processor util"?
          Attached Files
          Last edited by pmsousa; 20-06-2012, 15:30. Reason: Added chart image

          Comment


            #6
            The answer to my question...

            First system.cpu.load is the CPU Queue Length not the % Processor Time (who gave the name to that item should revise it...)

            From wikipedia and another post on this forum I found the comparison between "CPU Load vs CPU Utilization" (http://en.wikipedia.org/wiki/Load_(computing)).

            From Microsoft I found this document about "Observing Processor Queue Length".

            With this information It was easy to duplicate my Zabbix graph on Windows performance monitor, understand the data and why that damn trigger was firing like crazy!!!


            Problem solved, Zabbix template updated with new item for CPU Utilization and new graph added...



            If you need some help with this issue, send me a message...

            Comment


              #7
              me too!

              I am also getting alot of emails regarding the "processor load is too high" message on Windows VM guests.

              Please help me out with the solution.

              Thanks for your help

              Comment


                #8
                Originally posted by vincecmic View Post
                I am also getting alot of emails regarding the "processor load is too high" message on Windows VM guests.

                Please help me out with the solution.

                Thanks for your help
                Hi Vince,

                Have you read the document from Wikipedia I've mentioned on my previous post?

                A first step is to understand, when talking about performance monitoring techniques, that "% Processor Time" and "Processor Queue Length" are two different things. Often people mistake "Processor Queue Lenght" with "% Processor Time" has I think the developer that made the Zabbix Windows Template and named it "system.cpu.load"... Also those metrics were developed 20 years ago for Windows NT and for physical machines not virtual (VMWare or Hyper-V).

                The "% Processor Time" counters in Windows are measurements derived using a sampling technique. The OS Scheduler samples the state of the CPU once per system clock tick, driven by a high priority timer-based interrupt.
                The "System\Processor Queue Length" counter in Perfmon is an instantaneous counter that reflects the current number of Ready threads waiting in the OS Scheduler queue.

                From Microsoft's Technet (http://technet.microsoft.com/en-us/l.../cc940375.aspx) you can see the usage of "% Processor Time" and "System\Processor Queue Length" combined in order to find saturated processors. There is also a note at the end with reference values for multiprocessor systems (also physical servers).

                On a virtual environment you can't use the same trigger values because virtual processors and queuing are handled differently than physical one's. The physical processor queues are used for several virtual processors and that messes up the values you use as reference for Zabbix triggers.
                If I remember right the value set for the trigger was 5 and if you read the note I've mentioned, on a system with high CPU activity the expected range of processor queue length is 4 to 12 so the trigger must be set higher.

                This post is getting veeerrrryyyy biiiigggg and I'm expecting a remote assist from a software provider at any minute, so I must stop for now but...

                You can make a first experience. Open a perfmon on two windows boxes (physical and virtual) and add the counters for "% Processor Time" and "System\Processor Queue Length". Let it run for a while and compare the graphs with the ones on the Technet article.

                I'll be back later...

                Comment


                  #9
                  Grrrrrrrrr

                  I've spent the last hour writing (editing) my previous post just to lose all the text I wrote after pressing the "Submit Reply"...

                  I'm going for a cigarette and think if I can manage to write all of that again!!!

                  Comment


                    #10
                    I finally moved to perf_counter["\238(_Total)\6"], results are better understandable for Microsoft Admins.
                    Last edited by emmanux; 08-11-2013, 21:40.

                    Comment


                      #11
                      If I correctly understood articls above, trigger must be
                      {serv:system.cpu.load[,avg1].last(0)}>5*{serv:system.cpu.num.last(0)}

                      Can anyone confirm than correct trigger for cpu.load shall looks like cpu.num*5?

                      Comment


                        #12
                        Originally posted by uniken1 View Post
                        If I correctly understood articls above, trigger must be
                        {serv:system.cpu.load[,avg1].last(0)}>5*{serv:system.cpu.num.last(0)}

                        Can anyone confirm than correct trigger for cpu.load shall looks like cpu.num*5?
                        I'm a bit puzzled by your question!!! cpu.num gives you the number of processor cores so with the expression {serv:system.cpu.load[,avg1].last(0)}>5*{serv:system.cpu.num.last(0)} you are saying that: the avg load of my cpu last value has to be greater than 5x times the number of cores of the server... Is this right?

                        Comment


                          #13
                          As I understood - yes. More cores - more queue can be considered as normal behavior.

                          Comment


                            #14
                            Originally posted by emmanux View Post
                            I finally moved to perf_counter["\238(_Total)\6"], results are better understandable for Microsoft Admins.
                            This worked well for me, thanks a lot

                            Comment


                              #15
                              Here is how I added item and trigger for CPU Utilization

                              Item:
                              Name: Processor Utilization (1 min average)
                              Type: Zabbix agent
                              Key: system.cpu.util[,,avg1]
                              Type of Information: Numeric (float)
                              Units: %
                              Applications: CPU

                              Trigger:
                              Name: CPU utilization is too high on {HOST.NAME}
                              Expression: {Template OS Windows:system.cpu.util[,,avg1].avg(60)}>90

                              Comment

                              Working...
                              X