Ad Widget

Collapse

Zabbix agent on Windows processor group limits view on CPU usage and number of CPU's

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rsimons
    Junior Member
    • Oct 2014
    • 2

    #1

    Zabbix agent on Windows processor group limits view on CPU usage and number of CPU's

    Hi,

    We're running a HPe Synergy 480 Gen10 server with Intel Xeon Gold 6244 CPU. In the BIOS NUMA clustering and Sub-NUMA clustering is enabled. Two NUMA clusters are formed, each with half the memory and 8 cores assigned (there are 16 cores in total in the system).
    We see that the zabbix agent is allowed to run on only one NUMA node at the same time and gets assigned by Windows to processor group 0. The zabbix agent reports almost zero CPU load which is true for the NUMA node that the agent is assigned to. The other NUMA node however is about 80% utiized (by Oracle) and Windows reports about 45% CPU load for the entire system. When the load on the NUMA node where Oracle runs is 100%, Oracle becomes slow, however zabbix still reports 0% CPU load becuase only the cores in the Numa node 1 are busy and the cores in the NUMA node 0 are idle.
    We can change the zabbix agent to run on Numa node 1 by changing the affinity for the processor to run on processor group 1 instead of processor group 0, and it will then report 100% CPU load for that NUMA node.

    When running a cpu discovery against the node from the zabbix server we see the following output in which there are indeed 16 cores, but 8 of these are reported 'offline':
    [root@rma02 ~]$ zabbix_get -s 10.25.254.53 -p 10050 -k system.cpu.discovery
    {"data":[{"{#CPU.NUMBER}":0,"{#CPU.STATUS}":"online"},{" {#C PU.NUMBER}":1,"{#CPU.STATUS}":"online"},{"{#CPU.NU MBER}":2,"{#CPU.STATUS}":"online"},{"{#CPU.NUMBER} ":3,"{#CPU.STATUS}":"online"},{"{#CPU.NUMBER}" :4," {#CPU.STATUS}":"online"},{"{#CPU.NUMBER}":5,"{#CPU .STATUS}":"online"},{"{#CPU.NUMBER}":6,"{#CPU.STAT US}":"online"},{"{#CPU.NUMBER}":7,"{#CPU.STATUS}" : "online"},{"{#CPU.NUMBER}":8,"{#CPU.STATUS}":" offl ine"},{"{#CPU.NUMBER}":9,"{#CPU.STATUS}":"offline " },{"{#CPU.NUMBER}":10,"{#CPU.STATUS}":"offline"}, { "{#CPU.NUMBER}":11,"{#CPU.STATUS}":"offline"}, {"{# CPU.NUMBER}":12,"{#CPU.STATUS}":"offline"},{"{#CPU .NUMBER}":13,"{#CPU.STATUS}":"offline"},{"{#CPU.NU MBER}":14,"{#CPU.STATUS}":"offline"},{"{#CPU.NUMBE R}":15,"{#CPU.STATUS}":"offline"}]}
    [root@rma02 ~]$

    By chance (we think) the oracle processes all run on NUMA node 1 which holds all 'offline' CPU's in above output and are therefore not monitored by zabbix.

    Is this the intended way of operation for the zabbix agent? (I can imagine it is since reporting 50% overall load when a single NUMA node is at 100% also makes no sense).

    How should we treat such a setup monitoring wise? What now happens is that zabbix reports 0% CPU load (which is correct from the zabbix perspective that shows only NUMA node 0), Oracle uses 100% CPU load and has problems and is slow (which is correct from Oracle perspective) and Windows reports 50% CPU load in task manager (which is correct from the perspective of the entire system).

    So basically I have 3 different values that are all correct .. :-(

    Advise appreciated!

    KR,

    Rob.
  • mprihodko
    Zabbix developer
    • Jun 2022
    • 4

    #2
    Hi,

    I am a Zabbix developer. A Zabbix user has registered an bug, which seems the same issue as described in this post. See https://support.zabbix.com/browse/ZBX-20260.

    This bug seems to be a rare one. Help from Zabbix users by providing additional information is appreciated. Please see my questions in JIRA.

    Regards,
    Mihails

    Comment

    • mprihodko
      Zabbix developer
      • Jun 2022
      • 4

      #3
      A possible fix is available for testing in https://support.zabbix.com/browse/ZB...comment-672534. We do not have such setup at Zabbix, and we cannot test it ourselves.

      Everybody with this problem is welcome to test it.

      Comment

      • mprihodko
        Zabbix developer
        • Jun 2022
        • 4

        #4
        The bugfix is ready and will be available in:
        • 6.4.0alpha1
        • 6.2.2rc1
        • 6.0.8rc1
        Last edited by mprihodko; 16-08-2022, 12:03.

        Comment

        Working...