Hi,
I thought I'd share something I've done to capture and graph IO stats in Zabbix for disks on Linux (Redhat/CentOS) using the device-mapper device files in /dev/mapper.
Before I start though I'd like to credit Mucknet, who's excellent article on getting hard disk performance stats gave me the inspiration to go one step further (http://www.muck.net/?p=19)
My reason for wanting to do this is two fold:
1) Most stats collection tools report IO stats against the dm device files. The problem with this is that dm device files are not permanent and can change between reboots in some situations. I have an iSCSI cluster at home where I've seen this happen. So I need something that I can rely on and the /dev/mapper device files fit the bill nicely.
2) Whether using LVM or Multipathd's "user friendly names" feature, the devices in /dev/mapper can be given names that match the purpose they're intended for. I have some systems at work for example where the names I've used are:
asm-db, asm-dbp1, asm-fra, asm-frap1, control, home, ocr1, system-oracle, system-root, system-swap, usr-local, vote1
These are much more meaningful in a graph than dm-0, dm-1, etc.
There was also the issue of getting access to the full range of IO stats available for a disk and and storing it in Zabbix. So this is what I've done.
First of all, save the script I've attached to this post (dz.txt) and make it executable. I put mine at /usr/local/bin/dz. This script takes 2 arguments, a device mapper name and a stats field. The choices for the latter (cut from the script) are:
Next add the following lines to /etc/zabbix/zabbix_agentd.conf and restart the Zabbix Agent.
Next, download the attached example XML file. Change the host, dns and ip fields to your own system. Then change the device-mapper names for each item you wish to capture. Two are provided, system-root and system-swap.
Finally, import the XML file into Zabbix and wait for 30-60 seconds for the stats to show up.
I've noticed in one or two cases, because /proc/diskstats reports total values and Zabbix is computing deltas between values, that sometimes the very first value for an item that Zabbix stores is the total value. I wish my systems really were capable of 43GB/s IO
but it's an erroneous value, in which case just flush the stats history for that item and start from fresh.
Enjoy,
John McNulty
I thought I'd share something I've done to capture and graph IO stats in Zabbix for disks on Linux (Redhat/CentOS) using the device-mapper device files in /dev/mapper.
Before I start though I'd like to credit Mucknet, who's excellent article on getting hard disk performance stats gave me the inspiration to go one step further (http://www.muck.net/?p=19)
My reason for wanting to do this is two fold:
1) Most stats collection tools report IO stats against the dm device files. The problem with this is that dm device files are not permanent and can change between reboots in some situations. I have an iSCSI cluster at home where I've seen this happen. So I need something that I can rely on and the /dev/mapper device files fit the bill nicely.
2) Whether using LVM or Multipathd's "user friendly names" feature, the devices in /dev/mapper can be given names that match the purpose they're intended for. I have some systems at work for example where the names I've used are:
asm-db, asm-dbp1, asm-fra, asm-frap1, control, home, ocr1, system-oracle, system-root, system-swap, usr-local, vote1
These are much more meaningful in a graph than dm-0, dm-1, etc.
There was also the issue of getting access to the full range of IO stats available for a disk and and storing it in Zabbix. So this is what I've done.
First of all, save the script I've attached to this post (dz.txt) and make it executable. I put mine at /usr/local/bin/dz. This script takes 2 arguments, a device mapper name and a stats field. The choices for the latter (cut from the script) are:
Code:
case $statName in
"read.ops") data=$4 ;; # Reads completed
"read.merged") data=$5 ;; # Reads merged
"read.sectors") data=$6 ;; # 512 byte sectors read
"read.ms") data=$7 ;; # milliseconds spent reading
"write.ops") data=$8 ;; # Writes completed
"write.merged") data=$9 ;; # Writes merged
"write.sectors") data=${10} ;; # 512 byte sectors written
"write.ms") data=${11} ;; # milliseconds spent writing
"io.active") data=${12} ;; # I/Os currently in progress
"io.ms") data=${13} ;; # milliseconds spent doing I/Os
"io.weight") data=${14} ;; # weighted # of milliseconds spent doing I/O
*) exit -1 ;;
esac
Code:
# Enhanced Disk IO stats UserParameter=custom.vfs.dev.read.ops[*],/usr/local/bin/dz $1 read.ops UserParameter=custom.vfs.dev.read.merged[*],/usr/local/bin/dz $1 read.merged UserParameter=custom.vfs.dev.read.sectors[*],/usr/local/bin/dz $1 read.sectors UserParameter=custom.vfs.dev.read.ms[*],/usr/local/bin/dz $1 read.ms UserParameter=custom.vfs.dev.write.ops[*],/usr/local/bin/dz $1 write.ops UserParameter=custom.vfs.dev.write.merged[*],/usr/local/bin/dz $1 write.merged UserParameter=custom.vfs.dev.write.sectors[*],/usr/local/bin/dz $1 write.sectors UserParameter=custom.vfs.dev.write.ms[*],/usr/local/bin/dz $1 write.ms UserParameter=custom.vfs.dev.io.active[*],/usr/local/bin/dz $1 io.active UserParameter=custom.vfs.dev.io.ms[*],/usr/local/bin/dz $1 io.ms UserParameter=custom.vfs.dev.io.weight[*],/usr/local/bin/dz $1 io.weight
Finally, import the XML file into Zabbix and wait for 30-60 seconds for the stats to show up.
I've noticed in one or two cases, because /proc/diskstats reports total values and Zabbix is computing deltas between values, that sometimes the very first value for an item that Zabbix stores is the total value. I wish my systems really were capable of 43GB/s IO
but it's an erroneous value, in which case just flush the stats history for that item and start from fresh.Enjoy,
John McNulty
Comment