Ad Widget

Collapse

[Linux] Monitoring for a read-only filesystem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Mojah
    Member
    • Apr 2010
    • 60

    #1

    [Linux] Monitoring for a read-only filesystem

    Hi,

    I'm wondering what the most efficient way is to test for a read-only filesystem with zabbix agent.

    Are there built-in commands that allow you to do so?

    I've thought about:
    - Monitoring read/write on the filesystem, and trigger if there are no more writes (comparing average of last 15 minutes with the last 60 minutes)
    - Writing an extra script to "touch" and "rm" files, and verify it that way

    I would prefer a method that allows you to do so straight from the default zabbix agent checks, instead of having to deploy extra scripts on each server.

    How have others overcome this?

    Regards,
    M.
  • marcel
    Senior Member
    Zabbix Certified Specialist
    • Oct 2010
    • 112

    #2
    monitoring read-only filesystem with read-writes?

    does not make much sense to me - can you please be more specific about your question?

    you can system.run[] a command - very simple and instant

    something like system.run["touch /tmp/zabbix.test && rm /tmp/zabbix.test"] or so
    Zabbix Certified Specialist for Large Environments since 12/2010

    Comment

    • Mojah
      Member
      • Apr 2010
      • 60

      #3
      Originally posted by marcel
      monitoring read-only filesystem with read-writes?
      The idea would be that if it's read-only, no writes would occur. By comparing values from 15 minutes ago with those from 60 minutes ago (when there was write activity) you can "assume" writes have stopped for a reason.

      Originally posted by marcel
      does not make much sense to me - can you please be more specific about your question?

      you can system.run[] a command - very simple and instant

      something like system.run["touch /tmp/zabbix.test && rm /tmp/zabbix.test"] or so
      That's actually a clean choice, I had not thought about running "system.run" to verify it. I'll be testing that!

      Comment

      • Mojah
        Member
        • Apr 2010
        • 60

        #4
        That _seemed_ an effective way of doing so, but if it's a read-only filesystem (or if the file is just missing when running the rm command), the item falls back to "unsupported", and no triggers seem to react.

        I made it to the following, to hide any possible errors from interfering:
        system.run["/bin/touch /tmp/test 1>/dev/null 2>/dev/null && /bin/rm /tmp/test 1>/dev/null 2>/dev/null && /bin/echo 1"]

        You can easily test it by changing the "test" filename in the rm command.

        So at this point, it does not seem viable. Or am I missing something?

        Comment

        • ufocek
          Senior Member
          • Aug 2006
          • 161

          #5
          I found this issue http://www.zabbix.com/forum/showthread.php?p=76370

          Comment

          • claytronic
            Member
            • Nov 2006
            • 52

            #6
            I was encountering the same problem do to some NFS storage problems and needed a way to find Ubuntu guests that had gone into a read-only failsafe. Here is the trigger I came up with based on the previous examples.

            Code:
            {Template_Linux:system.run["/bin/touch /tmp/zabbix.test >/dev/null 2>&1 && /bin/rm /tmp/zabbix.test; echo $?"].nodata(300)}=1 | {Template_Linux:system.run["/bin/touch /tmp/zabbix.test >/dev/null 2>&1 && /bin/rm /tmp/zabbix.test; echo $?"].last(0)}=1

            Comment

            • tim.mooney
              Senior Member
              • Dec 2012
              • 1427

              #7
              Originally posted by Mojah
              Hi,

              I'm wondering what the most efficient way is to test for a read-only filesystem with zabbix agent.
              We use custom scripts to monitor for read-only filesystems and also to monitor that the MD devices are healthy. The "checkro.sh" script takes a single argument (the mount point to check) and uses grep on /proc/mounts to verify it finds it with a "rw". It then echoes (outputs) either 0 or 1.

              You just need to place it on the clients (we package as part of a local monitoring-helpers package and distribute via puppet and a yum repo) and make certain that the zabbix_agentd.conf has an entry for it, such as

              Code:
              UserParameter=custom.checkro[*],/usr/local/libexec/monitoring/checkro.sh "$1"
              Your item key would be custom.checkro[/path/to/mountpoint]

              We also defined custom value maps so that we see "OK" or "Read-Only" when looking at the items.

              Comment

              • gigatec
                Junior Member
                • Sep 2011
                • 15

                #8
                Is there any link where I can get the 'checkro.sh' Script?

                Thx & regards
                Stephan

                Comment

                • vintagegamingsystems
                  Member
                  • Jun 2013
                  • 57

                  #9
                  Here is an example. Yours may look different...

                  Code:
                  #!/bin/bash
                  
                  #Author: Joshua Cagle
                  
                  # This script checks for filesystems that are read only. This will
                  # be used in conjuction with the Zabbix monitoring and alerting system.
                  # If the script returns 0 then the filesystem in read only mode.
                  # This script uses a non-exhaustive case statment of potential mount paths,
                  # yet they are common in our environment. 
                  
                  mountPoint=$1
                  
                  [ "$#" -eq 1 ] || { echo "usage: checkro.sh <mountPoint> "; exit 1; }
                  case "$1" in 
                  	/)
                  		regex="^rootfs\s/\s"
                  		;;
                  	/proc)
                  		regex="^proc\s/proc\s"
                  		;;
                  	/sys)
                  		regex="^sysfs\s/sys\s"
                  		;;
                  	/dev)
                  		regex="^devtmpfs\s/dev\s"
                  		;;
                  	/dev/pts)
                  		regex="^devpts\s/dev/pts\s"
                  		;;
                  	/dev/mapper/VolGroup00-LogVol00)
                  		regex="^/dev/mapper/VolGroup00-LogVol00\s/\s"
                  		;;
                  	/proc/bus/usb)
                  		regex="^/proc/bus/usb\s/proc/bus/usb\s"
                  		;;
                  	/boot)
                  		regex="^/dev/sda1\s/boot\s"
                  		;;
                  	/proc/sys/fs/binfmt_misc)
                  		regex="^none\s/proc/sys/fs/binfmt_misc\s"
                  		;;
                  	/var/lib/nfs/rpc_pipefs)
                  		regex="^/sunrpc\s/var/lib/nfs/rpc_pipefs\s"
                  		;;
                  	/proc/fs/nfsd)
                  		regex="^nfsd\s/proc/fs/nfsd\s"
                  		;;
                  	*)
                  		echo "Please enter a supported path."
                  		exit 128
                  		;;
                  esac
                  if cat /proc/mounts | grep ${regex}  | grep "\srw" > /dev/null
                  	then
                  		echo "1"
                  	else
                  		echo "0"
                  fi

                  Comment

                  • tobias.pal
                    Junior Member
                    • Jan 2015
                    • 1

                    #10
                    That's a bit complicated for me. I just want to know if any of the filesystems went read only. A very simple check is this:
                    Code:
                    fgrep -c ' ro,' /proc/mounts
                    But we can improve it a bit:
                    Code:
                    awk '$4 ~ "^ro[,$]" {print $0}' /proc/mounts | wc -l
                    And if you need to filter out some filesystem types:
                    Code:
                    awk '$4 ~ "^ro[,$]" && $3 !~ "(squashfs|iso9660)" {print $0}' /proc/mounts | wc -l
                    (I know I could count from awk and print the result, but it's much easier to figure out which filesystem has the problem if I can just copypaste the part before the pipe.)

                    Comment

                    • coreychristian
                      Senior Member
                      Zabbix Certified Specialist
                      • Jun 2012
                      • 159

                      #11
                      Saw this bumped so I thought I would toss in my 2 cents. We actually moved to using zabbix sender because we found that sometimes our log directory would go read only and crash the zabbix agent before the agent could report the filesystem as read only.

                      We also do a touch test as we found with some of our linux VM's all file systems would go read only preventing the fstab from updating (never verified /proc/mounts.

                      For triggers, we have a nodata trigger and last>0 trigger.


                      Code:
                      ##################################
                      # File System Read Only Check
                      ##################################
                      
                      #Build File List array
                      FSROEXTRACT=`cat /etc/fstab| egrep "ext" | grep -v "^#"| awk '{ print  $2 }'`
                      FSROLIST=(
                              $FSROEXTRACT
                              )
                      
                      #Check if each file system is writeable
                      count=0
                      FSROCHECKTOTAL=0
                      while [ "x${FSROLIST[count]}" != "x" ]
                      do
                              FSROCHECK=`touch ${FSROLIST[count]}/test.txt 2> /dev/null && { rm ${FSROLIST[count]}/test.txt 2> /dev/null; echo "0"; } || echo "1"`
                              FSROCHECKTOTAL=$(( $FSROCHECKTOTAL + $FSROCHECK ))
                              if [ "$FSROCHECK" == 1 ] && [ -d "${FSROLIST[count]}" ]; then
                                      echo $CURDATE ${FSROLIST[count]} is read only. >> /tmp/fsrocheck.log
                              fi
                              count=$(( $count + 1 ))
                      done
                      
                      if [ "$FSROCHECKTOTAL" == 0 ]; then
                              echo $CURDATE all file systems are writeable. >> /tmp/fsrocheck.log
                      fi
                      
                      #Send the fsrochecktotal to zabbix
                      /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -s $HOSTNAME -k filesystem.ro.check -o $FSROCHECKTOTAL

                      Comment

                      • Bock
                        Junior Member
                        • Feb 2013
                        • 26

                        #12
                        probably solution is
                        system.run["grep -c read-only /proc/mounts "] and check if this >0

                        Comment

                        • LenR
                          Senior Member
                          • Sep 2009
                          • 1005

                          #13
                          Since I'm in the history on this one :-)

                          In our case, when a vmware disk goes read-only, /proc/mounts does not reflect the status.

                          Comment

                          • coreychristian
                            Senior Member
                            Zabbix Certified Specialist
                            • Jun 2012
                            • 159

                            #14
                            Originally posted by LenR
                            Since I'm in the history on this one :-)

                            In our case, when a vmware disk goes read-only, /proc/mounts does not reflect the status.
                            Yup that is why I wrote that script, our issue was the same with VMWare, all of our disks would go read only, so the files never got updated saying the file systems were read only.

                            We also had to put it into cron because the zabbix agent would sometimes die when it can't write to the log file.

                            While the script I wrote could likely be cleaned up a bit it does work fairly well.

                            I can export the items/triggers I have that go along with that script if you guys want.

                            Comment

                            • LenR
                              Senior Member
                              • Sep 2009
                              • 1005

                              #15
                              The underlying cause was pretty much a mystery. Our VMware environment has both Linux and Windows on the same hosts, this problem didn't seem to effect Windows hosts.

                              And, as we have change VMware infrastructure, it seems to have gone away. Our storage was iscsi based, what is yours?

                              Comment

                              Working...