Ad Widget

Collapse

PATCH: percentage utilization for disks

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lamont
    Member
    • Nov 2007
    • 89

    #1

    PATCH: percentage utilization for disks

    Attatched is a patch to add % utilization to disk monitoring. It extends vfs.dev* metrics like this:

    vfs.dev.read[sda1,putil] - read utilization
    vfs.dev.write[sda1,putil] - write utilization
    vfs.dev[sda1,putil] - overall % utilization
    vfs.dev[sda1,sectors] - sum of wsect + rsect
    vfs.dev[sda1,operations] - sum of wio + rio

    I have concerns about the first two since the linux kernel actually sums ms spent over all the i/o operations in flight for those two, which can result in numbers like 14,000% utilization. The vfs.dev[sda1,putil] is *not* the result of summing the read and write values and is a different counter in the kernel which counts the number of ms where there has been at least one i/o in flight -- it appears to reliably max out at 99-100%.
    Attached Files
  • lamont
    Member
    • Nov 2007
    • 89

    #2
    and here's a patch which is the above, but removes vfs.dev.read/write[<dev>,putil] as probably being too confusing to end users. it only adds:

    vfs.dev[sda1,putil] - %age utilization
    vfs.dev[sda1,sectors] - rsect + wsect
    vfs.dev[sda1,operations] - rio + wio
    Attached Files

    Comment

    • cpicton
      Member
      • Nov 2006
      • 35

      #3
      I currently have a custom script to get this and extra data

      I am monitoring average time for request (for read and write)

      rwait = ruse/rio

      wwait = wuse/wio

      This allows me to see when disk response is slowing unacceptably

      Comment

      • lamont
        Member
        • Nov 2007
        • 89

        #4
        so, i sat down and went 'cool, i'll just implement that in the agent' and started working on it and divided ruse / rio and wuse / wio and then thought that through and realized i was just getting an average time since the machine booted or complete garbage based on when the counters have wrapped...

        so what i really want to do is be taking the actual instantaneous values computed from the deltas in zabbix:

        vfs.dev.write[sda1,use] / vfs.dev.write[sda1,operations]

        which means constructing items based on doing math on other items, which i don't believe zabbix supports...

        it'd be nice if the math you can do on triggers in zabbix was ported to graph values and to meta-items...

        anyway, i assume you've got a "background" process which is computing these deltas for you and the zabbix-agent is just grabbing the last value? i didn't see any links for what you're using for your agent config up there, although the explanation you've given of the io values was highly useful and confirms all the bogosity that i've seen in iostat so far...

        Comment

        • cpicton
          Member
          • Nov 2006
          • 35

          #5
          Originally posted by lamont
          so, i sat down and went 'cool, i'll just implement that in the agent' and started working on it and divided ruse / rio and wuse / wio and then thought that through and realized i was just getting an average time since the machine booted or complete garbage based on when the counters have wrapped...

          so what i really want to do is be taking the actual instantaneous values computed from the deltas in zabbix:

          vfs.dev.write[sda1,use] / vfs.dev.write[sda1,operations]

          which means constructing items based on doing math on other items, which i don't believe zabbix supports...

          it'd be nice if the math you can do on triggers in zabbix was ported to graph values and to meta-items...
          And this is a good usecase of where it would be useful

          anyway, i assume you've got a "background" process which is computing these deltas for you and the zabbix-agent is just grabbing the last value?
          Yes, I have a custom script on all my agent machines, which reads extra things that zabbix can't, and potentially stores the previous values in a persistent perl hash. I am computing the results based on the difference from last poll.

          Code:
          UserParameter=custom[*], /usr/local/scripts/zabbix/get-custom-value "$1" "$2" "$3" "$4" "$5" "$6"
          $1 is the name of a script to run on the system (in a known directory). All my agents pull a master copy of all scripts from my zabbix server on a scheduled basis.

          This allows me to quickly add support for different raid controllers, mysql, openvz beancounters, etc etc to all my agents.

          Comment

          Working...