Ad Widget

Collapse

Smarter "Free disk space is less than 20% on volume XX"

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • alfsolli
    Junior Member
    Zabbix Certified Trainer
    Zabbix Certified Specialist
    • Aug 2010
    • 19

    #1

    Smarter "Free disk space is less than 20% on volume XX"

    Another alarm comes inn; Disk XX is below 20% capacity.
    Great. But how big is this disk? How fast is it filling up? We have no clue

    So I made a different item + triggers to provide some more details.
    I'm not going to claim it's perfect, so if anyone have suggestions to improve, please be my guest.

    Item created by LLD:
    Name: Hours til disk $1 runs out
    Type: calculated
    key: hours.to.disk.full[{#FSNAME}]
    Formula:
    last("vfs.fs.size[{#FSNAME},free]")/(last("vfs.fs.size[{#FSNAME},free]",0,3600)-(last("vfs.fs.size[{#FSNAME},free]")+1))
    Type of information: Numeric (float)
    Units: hours

    The rest (interval etc) is whatever you please.


    Then, Trigger created by LLD:

    Name: At this rate, {#FSNAME} will run out of space within {ITEM.LASTVALUE1} ({ITEM.VALUE3} left)

    Expression:
    ({Template OS Linux:hours.to.disk.full[{#FSNAME}].last(0)}<24 & {Template OS Linux:hours.to.disk.full[{#FSNAME}].last(0)}>0) | {Template OS Linux:vfs.fs.size[{#FSNAME},free].last(0)}=0

    Note that I have this item + trigger in the filesystem discovery rule under "Template OS linux" Adjust accordingly if this does not fit you.


    This item + trigger will do the following
    Compare each last disk free (in Bytes) with the value one hour ago.
    It divides the remaining diskspace by the change in usage during this last hour. This gives an approximate number of hours you have left until the disk is full.

    The trigger is set to fire if the number is less than 24 hours, but higher than 0. (this is to avoid false alarms if you get a negative number. this happens when you free up space to a point higher than one hour ago )

    Note the part "free]")+1))" in the item formula. This is only to avoid division by zero, which would make the item unsupported for a while. One extra byte to the equation doesn't really matter in 2013 if you ask me.


    When an alarm is fired, you get something like this:

    Cobra
    At this rate, / will run out of space within 10.85 hours (32.79 GB left)

    The lower the diskspace is, the naggier this will become. But as long as the usage is stable (no matter how low), it will be silent.
    Last edited by alfsolli; 03-10-2013, 15:59.
  • chojin
    Member
    Zabbix Certified Specialist
    • Jul 2011
    • 64

    #2
    Nice idea!

    But I would propose a few small adjustments:
    - make the item calculate seconds instead of hours. This way zabbix converts it to human readable text (like 5h 36m 15s) instead of something like 25.65 Khours:
    Formula:
    Code:
    last("vfs.fs.size[{#FSNAME},free]")/(last("vfs.fs.size[{#FSNAME},free]",0,3600)-(last("vfs.fs.size[{#FSNAME},free]")+1))*3600
    Units: s
    Of course, the trigger has then to check against <86400 instead of <24

    - Make the trigger check the avg value of 10 minutes or so, to prevent trigger flapping
    Code:
    ({Template OS Linux:hours.to.disk.full[{#FSNAME}].avg(10m)}<86400 & {Template OS Linux:hours.to.disk.full[{#FSNAME}].avg(10m)}>0) | {Template OS Linux:vfs.fs.size[{#FSNAME},free].last(0)}=0

    Comment

    • alfsolli
      Junior Member
      Zabbix Certified Trainer
      Zabbix Certified Specialist
      • Aug 2010
      • 19

      #3
      thanks

      *3600, of course!
      I originally wanted it in seconds, exactly for the readons you describe, but I could't figure out the basic math at the end of a very long and caffeinated day.

      Thanks alot for your input, I'll adjust my code with your proposals.

      - Alf -

      Comment

      • omitchell
        Junior Member
        • Oct 2013
        • 2

        #4
        Problem with stops?

        I implemented a variation of this item and trigger combo this afternoon, but found that I couldn't get the API to accept the trigger when there were stops in the Item Key. Eventually, I got around it by using the following key instead:

        Code:
        hours_until_full[{#FSNAME}]
        I also use 96 hours as the timeframe rather than 24 in order to allow for issues occurring over weekends and public holidays.

        Thanks to alfsolli for the original inspiration!

        Comment

        • Amun
          Junior Member
          • Nov 2013
          • 1

          #5


          As you can see above, I'm having odd readings given in the trigger list.

          This is my item formula:

          Code:
          last("vfs.fs.size[{#FSNAME},free]")/(last("vfs.fs.size[{#FSNAME},free]",0,86400)-(last("vfs.fs.size[{#FSNAME},free]")+1))*3600
          I was trying to compare the values last received and those over a day ago I believe this is where I am going wrong.

          and this is my trigger

          Code:
          ({Windows:hours.to.disk.full[{#FSNAME}].avg(10m)}<86400 & {Windows:hours.to.disk.full[{#FSNAME}].avg(10m)}>0) | {Windows:vfs.fs.size[{#FSNAME},free].last(0)}=0
          Can someone explain whats going on in the above as I am trying to compare the average change over a longer period as I am getting triggers on FSs that have amples of space (over 500gb)

          Thanks!

          Comment

          • rido
            Junior Member
            • Nov 2013
            • 1

            #6
            nice tips, implemented this and lets see how it works

            Comment

            Working...