Hi,
I found this: https://share.zabbix.com/cat-server-...-multiple-gpus project as a way to monitor multiple GPUs in a system.
I am struggling with how to add these as items to the Zabbix GUI. Per the instructions I have added UserParameter= to the zabbix_agentd.conf as follows:
UserParameter=gpu.number,/usr/bin/nvidia-smi -L | /usr/bin/wc -l
UserParameter=gpu.discovery,/etc/zabbix/scripts/get_gpus_info.sh
UserParameter=gpu.fanspeed[*],nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.power[*],nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.temp[*],nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.utilization[*],nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.memfree[*],nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.memused[*],nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.memtotal[*],nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.utilization.dec.min[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 DEC | grep Min | tr -s ' ' | cut -d ' ' -f 4
UserParameter=gpu.utilization.dec.max[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 DEC | grep Max | tr -s ' ' | cut -d ' ' -f 4
UserParameter=gpu.utilization.enc.min[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 ENC | grep Min | tr -s ' ' | cut -d ' ' -f 4
UserParameter=gpu.utilization.enc.max[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 ENC | grep Max | tr -s ' ' | cut -d ' ' -f 4
I am running into trouble with the flexible user parameter[*]. There is a script that goes in /etc/zabbix/scripts and pulls the GPU uuid like this:
{
"data":[
{"{#GPUINDEX}":"0", "{#GPUUUID}":"GPU-24af880d-7346-ed57-51s6-da821e905c12"}
]
}
Then that is passed to[*].
I have created a template for this in Zabbix then I have tried to add the items above, like gpu.number, gpu,discovery, gpu.fanspeed. But how can I get these items to use the output of /etx/zabbix/scripts/get_gpus_info.sh? I have tried adding them like gpu.fanspeed[*] but no luck. Any guidance is appreciated.
I found this: https://share.zabbix.com/cat-server-...-multiple-gpus project as a way to monitor multiple GPUs in a system.
I am struggling with how to add these as items to the Zabbix GUI. Per the instructions I have added UserParameter= to the zabbix_agentd.conf as follows:
UserParameter=gpu.number,/usr/bin/nvidia-smi -L | /usr/bin/wc -l
UserParameter=gpu.discovery,/etc/zabbix/scripts/get_gpus_info.sh
UserParameter=gpu.fanspeed[*],nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.power[*],nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.temp[*],nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.utilization[*],nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.memfree[*],nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.memused[*],nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.memtotal[*],nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits -i $1 | tr -d "\n"
UserParameter=gpu.utilization.dec.min[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 DEC | grep Min | tr -s ' ' | cut -d ' ' -f 4
UserParameter=gpu.utilization.dec.max[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 DEC | grep Max | tr -s ' ' | cut -d ' ' -f 4
UserParameter=gpu.utilization.enc.min[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 ENC | grep Min | tr -s ' ' | cut -d ' ' -f 4
UserParameter=gpu.utilization.enc.max[*],nvidia-smi -q -d UTILIZATION -i $1 | grep -A 5 ENC | grep Max | tr -s ' ' | cut -d ' ' -f 4
I am running into trouble with the flexible user parameter[*]. There is a script that goes in /etc/zabbix/scripts and pulls the GPU uuid like this:
{
"data":[
{"{#GPUINDEX}":"0", "{#GPUUUID}":"GPU-24af880d-7346-ed57-51s6-da821e905c12"}
]
}
Then that is passed to[*].
I have created a template for this in Zabbix then I have tried to add the items above, like gpu.number, gpu,discovery, gpu.fanspeed. But how can I get these items to use the output of /etx/zabbix/scripts/get_gpus_info.sh? I have tried adding them like gpu.fanspeed[*] but no luck. Any guidance is appreciated.
Comment