Ad Widget

**Clansman** · 12-06-2006, 16:00

Should not be hard, but tricky if you want to monitor several arrays.

Anyway, I'd suggest to write a script to do so and generate some predictable output (like 0 - all ok; 1 - at least one array not ok) and then use it on zabbix, instead of polluting the zabbix_agent configuration file with multine 10-piped command with awk scripting between the pipes... :-)

Cheers,

**LEM** · 23-06-2006, 22:14

I personnaly use things like:

Code:

UserParameter=custom.md.md0,/etc/zabbix/bin/custom.md md0
UserParameter=custom.md.md1,/etc/zabbix/bin/custom.md md1

Where /etc/zabbix/bin/custom.md is a script that just:
. system call 'mdadm --display /dev/$1'
. cut/grep/ to return the 'State : ' string.

Hope this'll help.

**cameronsto** · 24-06-2006, 21:37

LEM, any chance you can post the full script you're using. I'm trying to piece one together and am having issues for some reason. If I run my command manually it works, but if I run it via a bash script I get errors.

Thanks,

Cameron

**LEM** · 28-06-2006, 10:39

UserParameter script used for md monitoring (sample)

Here is what I use in zabbix_agentd.conf:

Code:

UserParameter=custom.raidstate.md0,/etc/zabbix/bin/custom.raidstate md0
UserParameter=custom.raidstate.md1,/etc/zabbix/bin/custom.raidstate md1

And here is the code for /etc/zabbix/bin/custom.raidstate :

Code:

#!/usr/bin/perl
#
#
#sudo /sbin/mdadm --detail /dev/md0|grep -i "State :"|cut -d ":" -f 2
#

use strict;
use warnings;

my $device = $ARGV[0];

my $return = `/usr/bin/sudo /sbin/mdadm --detail /dev/$device  |grep -i \"State :\"|cut -d \":\" -f 2`;

chomp ($return);
$return =~ s/\ //g;

if ( $return eq 'clean' ) {
  print "0";
} else {
  print "1";
}

# - The End

I use Numeric (float) to store this kind of value with no custom multiplier. For triggering, I use something like:

Code:

{MyHost:custom.raidstate.md0.last(0)}>0

To be able to use mdadm --detail as zabbix user, I use sudo with the following statements in sudoers file:

Code:

# Cmnd alias specification
Cmnd_Alias ZABBIXCMD = /sbin/mdadm --detail *
# ZABBIX special privileges
zabbix  ALL=NOPASSWD:   ZABBIXCMD

Hope this'll help you.

Cheers,

**cameronsto** · 28-06-2006, 14:05

That definitely helped. I don't know why I didn't think to use perl, but I was using bash and for whatever reason it wasn't working. I did get an error trying to setup the sudo command. When trying to run it as zabbix I received "permission denied" on /dev/md0. In the meantime I just have a cron job running as root and printing the status out to a file.

Thanks for the tips.

cameron

**Nate Bell** · 28-06-2006, 15:59

If you want an alternative bash script, here's what I use:

Code:

#!/bin/bash
# Usage: raid.sh <disk device name to check>
# Ex:     ./raid.sh md0
disk=$1
temp=$(grep -A1 $disk /proc/mdstat | grep UU | wc -l)
echo $temp

Since mdstat in /proc keeps track of the raid arrays, and prints UU if things are kosher, and either _U or U_ or even __ if things have gone really downhill, then grepping for UU works. Do a word count on the results and you get a 1 or 0 response. My results are the opposite of LEM's since a 1 for me is good, but a 1 for LEM is bad.

You know, now that I look at that script again, I could just make it one line and throw the script away.

Code:

UserParameter=mdstat[*],grep -A1 $1 /proc/mdstat | grep UU | wc -l

Huh, that's even easier. Hell, there might be a way to make that a system.run command. Maybe it's time to take a look at my scripts and see what I've learned since I wrote them. Anyhow, just giving more options.

Nate

**cameronsto** · 28-06-2006, 20:06

Originally posted by Nate Bell

Since mdstat in /proc keeps track of the raid arrays, and prints UU if things are kosher, and either _U or U_ or even __ if things have gone really downhill, then grepping for UU works. Do a word count on the results and you get a 1 or 0 response. My results are the opposite of LEM's since a 1 for me is good, but a 1 for LEM is bad.

My output with 4 drives is [UUUU]. So even if 1 drive failed it could still pass your script if the output was [UUU_] right?

-cameron

**Nate Bell** · 28-06-2006, 21:56

Ah, true, though you could just grep for UUUU and it would work.

How about trying this one on for size:

Code:

UserParameter=mdstat[*],grep -A1 $1 /proc/mdstat | tail -n1 | grep _ | wc -l

That one doesn't care how many drives you have, only that one or more of them has gone missing, and it even gives the same results LEM's does.

Nate

**pdwalker** · 18-10-2006, 08:33

Just a small revision

Change:

Code:

UserParameter=mdstat[*],grep -A1 $1 /proc/mdstat | tail -n1 | grep _ | wc -l

To:

Code:

UserParameter=mdstat[*],grep -A1 $1 /proc/mdstat | tail -n1 | grep -c _

The -c argument will count the number of occurances.

Acutally, you can even remove the tail command since (at least on my linux systems, the underscore ('_') only occurs when a device has failed and does not appear on the first status line for the device

Code:

UserParameter=mdstat[*],grep -A1 $1 /proc/mdstat | grep -c _

**simix** · 26-10-2006, 13:26

I'm maintaining my own raidmon tool which can easy be integrated with zabbix. The tool is here http://www.invoca.ch/pub/packages/raidmon/

In zabbix, I have this config to monitor disks for zabbix-1.1.x:
Items:
RAID number of failed devices in arrays system.run[raidmon status failed,wait] 60 7 365 ZABBIX agent
RAID number of syncing arrays system.run[raidmon status syncing,wait] 60 7 365 ZABBIX agent
RAID number of arrays system.run[raidmon status number,wait] 60 7 365 ZABBIX agent

Triggers:
RAID has failed devices in arrays on {HOSTNAME} {Unix_t:system.run[raidmon status failed,wait].last(0)}>0 High
RAID is syncing arrays on {HOSTNAME} {Unix_t:system.run[raidmon status syncing,wait].last(0)}>0 Average
RAID number of arrays has changed on {HOSTNAME} {Unix_t:system.run[raidmon status number,wait].diff(0)}>0 Information

**prh** · 04-11-2006, 13:26

RAID monitoring using system.run

If you have EnableRemoteCommands set in you agents (WARNING: potential security issues involved) you could just use this item:

Description: Failed RAID devices
Key: system.run[cat /proc/mdstat | egrep '(U_|_U)' | wc -l]

Returns the number of failed RAID devices.
Returns zero if no failed RAID devices or no RAID devices at all.

**zalink** · 30-12-2006, 18:55

Sadly none of the solutions presented here work for me.

I have found that mdstat Status sometimes returns dirty when its still busy raiding data.

Then the solutions for /proc/mdstat do not work well for multiple devices.

It seems however that mdadm returns a numeric result code that can very easily be used.

0 The array is functioning normally.

1 The array has at least one failed device.

2 The array has multiple failed devices and hence is unus-
able (raid4 or raid5).

4 There was an error while trying to get information about
the device.

Thus I used the following:

Code:

UserParameter=mdstat[*],sudo /sbin/mdadm --detail -b /dev/$1 >/dev/null 2>&1; echo $?

You will still need the mentioned addition to the sudo config via visudo

Code:

Cmnd_Alias ZABBIXCMD = /sbin/mdadm --detail *
# ZABBIX special privileges
zabbix  ALL=NOPASSWD:   ZABBIXCMD

**Pak** · 27-01-2010, 12:15

I had some little issue to implement this monitor, casue I'm newbe (it's only one week I use Zabbix)

So I write how I done it, maybe it could help someone

in /etc/zabbix/zabbix_agentd.conf i add this

#CONTROLLO RAID
UserParameter=custom.mdstat[*],cat /proc/mdstat | grep -c _

then I add a Item to the host with
Key: custom.mdstat[*]
and no particular settings

then I add a trigger
Expression: {hostname:custom.mdstat[*].last(0)}>0

It works like a charm

and it count how many disks fails

I try this settings in 3 mirrors environment (/dev/md0 (sda1,sdb1), /dev/md1 (sda2,sdb2), /dev/md1 (sda3,sdb3)), and I try to put in fail every mirror and it works...
I don't know if it works with more than 2 device (for example Raid 5), but I suppose that it works

my 2 cents

Paolo

**sybex** · 01-02-2010, 07:18

Hi, ...

i just put the output of the Raid status into an file. After that i use the zabbix standard function to checksum this file.

If there are any changes on the RAID status, the checksum will also change and anyway i have to check the status if here is any changes. Because this means that something have changed there.

To have an trigger at high severity could also help to get informed by an error.

Btw i dont have a software raid by linux, it is a hardware raid from a HP machine.

Ad Widget

linux software RAID monitoring

linux software RAID monitoring

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment