Ad Widget

Collapse

CPU Utilization incorrect for subset of hosts

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • keitht
    Junior Member
    • Feb 2021
    • 12

    #1

    CPU Utilization incorrect for subset of hosts

    This issue appeared a while back and seems to be slowly spreading, currently impacting three of our hosts. They are running centos7 and are ec2 instances, as are others that are working as exected. Yesterday I updated one to the most recent zabbix agent2 version (6.4.2). When I issue a zabbix_get against the host, with -k system.cpu.util, it responds with value around 0.07. The value I expect is between 40 and 60, based on top and cloudwatch metrics. I've set debuglevel to 5 for the agent and re-issued the zabbix_get. I can see the request arriving, a few lines related to exporter task and the response being sent (showing value), but nothing that looks like an error.

    What else can I check?

    Thanks
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    #2

    I think there is no real error, what debug would show to you. Agent just requests that data from host and sends back what it has. If you expect much different numbers, maybe you should dig into it, what exactly is asked and what is that You expect... system.cpu.util without any parameters will return defaults as system.cpu.util[all,user]. Is it the same as you see from top or any other metric gathering? Maybe those include something else?
    Just some food for thinking, are you comparing apples to apples, or is there maybe a orange involved..

    I don't deny, that there can be a bug also, but then it should be reproducible... But somehow, if other hosts work and one does not.... I would still look more towards those hosts, not the agent...

    Comment

    • keitht
      Junior Member
      • Feb 2021
      • 12

      #3
      Originally posted by cyber
      I think there is no real error, what debug would show to you. Agent just requests that data from host and sends back what it has. If you expect much different numbers, maybe you should dig into it, what exactly is asked and what is that You expect... system.cpu.util without any parameters will return defaults as system.cpu.util[all,user]. Is it the same as you see from top or any other metric gathering? Maybe those include something else?
      Just some food for thinking, are you comparing apples to apples, or is there maybe a orange involved..

      I don't deny, that there can be a bug also, but then it should be reproducible... But somehow, if other hosts work and one does not.... I would still look more towards those hosts, not the agent...

      I agree, but I'm not sure where to look next. Something is wrong somewhere, as something is clearly returning 'bad' information for a request for cpu utilization. The hosts that are behaving differently should share the same configuration as the ones behaving as expected. They are performing the same functions and are redundant servers and/or testing servers for a project. The issue is not (yet?) impacting our production servers for this project. There are three hosts returning 'bad' information out of seven hosts that should share the same configuration.. The difference with cpu utilization is 'in my face' since I have a cpu utilization chart on my primary dashboard, but I am left wondering if there are other items that are also impacted.

      You mentioned system.cpu.util without parameters will default to [all,user]. Where would this default behavior be changed? That sounds like something I could check, if I knew where to check.

      Thanks

      Comment

      • keitht
        Junior Member
        • Feb 2021
        • 12

        #4
        Originally posted by keitht

        You mentioned system.cpu.util without parameters will default to [all,user]. Where would this default behavior be changed? That sounds like something I could check, if I knew where to check.

        Thanks
        Well, I tried re-issuing the zabbix_get command, this time adding the [all,user] instead - and it (something) is still returning bad information. the result is 0.09 while I'm expecting around 45.0

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4806

          #5
          Originally posted by keitht
          You mentioned system.cpu.util without parameters will default to [all,user]. Where would this default behavior be changed? That sounds like something I could check, if I knew where to check.
          Default behaviour can be changed with adding parameters...
          system.cpu.util[<cpu>,<type>,<mode>,<logical_or_physical>]
          CPU utilization percentage. Float cpu - <CPU number> or all (default)
          type - possible values:
          user (default), idle, nice, system (default for Windows), iowait, interrupt, softirq, steal, guest (on Linux kernels 2.6.24 and above), guest_nice (on Linux kernels 2.6.33 and above).
          See also platform-specific details for this parameter.
          mode - possible values:
          avg1 (one-minute average, default), avg5, avg15
          logical_or_physical (since version 5.0.3; on AIX only) - possible values: logical (default), physical. This parameter is supported on AIX only.
          On Windows the value is acquired using the Processor Time performance counter. Note that since Windows 8 its Task Manager shows CPU utilization based on the Processor Utility performance counter, while in previous versions it was the Processor Time counter.

          Example:
          => system.cpu.util[0,user,avg5]

          Old naming: system.cpu.idleX, system.cpu.niceX, system.cpu.systemX, system.cpu.userX

          Comment

          Working...