ZABBIX Forums  

Go Back   ZABBIX Forums > Zabbix Discussions and Feedback > Zabbix Cookbook

Reply
 
Thread Tools Display Modes
  #1  
Old 24-05-2011, 15:08
mcree mcree is offline
Junior Member
 
Join Date: Apr 2008
Posts: 1
Post Monitoring LSI / Symbios MegaRAID SAS raid controller (found in several Dell servers)

Hello fellow zabbixers!

I thought that it would be nice to get notified about raid disk failures by zabbix, so I've put together a small bash script to generate monitoring XML templates for LSI and Symbios MegaRAID controllers using the MegaCli executable downloadable from www.lsi.com.

You should run the script on the host to be monitored since it only generates template items and notification triggers for existing drives. You must set the path to the MegaCli executable in the header of the script, then run it like:

Code:
bash confgen_zabbix_megacli.sh > megaraid_template.xml
On success you should see something like:
xml
Code:
+ detecting adapters
+ found 1 adapter(s)
+ examining adapter 0
+ found disk: 32:0
+ found disk: 32:1
+ found disk: 32:2
+ done
And of course the file 'megaraid_template.xml' would contain the template generated for your configuration.

Don't forget to add the following line to your zabbix_agentd.conf and restart your agent:

Code:
UserParameter=megaraid[*],sudo $CMD -pdInfo -PhysDrv[$2:$3] -a$1 | grep '$4' | cut -f2 -d':' | cut -b2-
Where $CMD is the path to your copy of MegaCli executable. Also consider that the command above presumes that the user running your agent is permitted to use 'sudo' (eg.: the zabbix user is in the sudoers file).

Cheers:
Erno Rigo
http://rigo.info
Attached Files
File Type: zip templategen_zabbix_megaraid.zip (2.0 KB, 1970 views)
Reply With Quote
  #2  
Old 22-06-2011, 02:48
wdingus wdingus is offline
Junior Member
 
Join Date: Dec 2007
Posts: 7
Default

It's always interesting to see how someone else solved the same problem I did... Yours is more elegant but this works for us.

We have some servers with 32-bit OS installs and some with 64-bit. So I tried to make this universal and able to work on either. A zero returned is good, anything non-zero means an error of some kind. Could be an actual RAID error or a problem with the command. So a simple trigger to alert on non-zero values.

system.run[(/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL 2>&1 || /opt/MegaRAID/MegaCli/MegaCli -LDInfo -Lall -aALL 2>&1) | grep -i 'State\|Permission' | grep -v Optimal | wc -l]
Reply With Quote
  #3  
Old 14-11-2012, 18:53
tttodorov tttodorov is offline
Junior Member
 
Join Date: Oct 2012
Location: Sofia, Bulgaria
Posts: 5
Send a message via Skype™ to tttodorov
Question

Thanks, mcree, wdingus, for the solutions!

wdingus, I have a question on your solution:
I have never seen a failed Drive with MegaCli. In normal condition the MegaCli state is:
# ---
...
State : Optimal
...
# ---
, but what MegaCLI says in failing state?
My thoughts are: if in failing state it says "Not Optimal", then your command "... | grep -v Optimal" will not detect the error. Have you ever tested your commands on failing environment?

Thanks again,
Todor

Quote:
Originally Posted by wdingus View Post
It's always interesting to see how someone else solved the same problem I did... Yours is more elegant but this works for us.

We have some servers with 32-bit OS installs and some with 64-bit. So I tried to make this universal and able to work on either. A zero returned is good, anything non-zero means an error of some kind. Could be an actual RAID error or a problem with the command. So a simple trigger to alert on non-zero values.

system.run[(/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL 2>&1 || /opt/MegaRAID/MegaCli/MegaCli -LDInfo -Lall -aALL 2>&1) | grep -i 'State\|Permission' | grep -v Optimal | wc -l]
Reply With Quote
  #4  
Old 15-11-2012, 20:40
wdingus wdingus is offline
Junior Member
 
Join Date: Dec 2007
Posts: 7
Default

Good question... I'm pretty sure we have had success with this monitor, alerting us to failures. I don't have a specific example handy. How about looking into the possibilities this way though:

# strings /opt/MegaRAID/MegaCli/MegaCli64 | grep Optimal -B3
Offline
Partially Degraded
Degraded
Optimal

Looks like those are the 4 possible values following State: so the "grep -v Optimal" would hopefully be fine. We're just interested in being alerted to *anything* other than Optimal, human eyes can investigate further.
Reply With Quote
  #5  
Old 15-11-2012, 20:50
tttodorov tttodorov is offline
Junior Member
 
Join Date: Oct 2012
Location: Sofia, Bulgaria
Posts: 5
Send a message via Skype™ to tttodorov
Wink

Yes, you are right about the "strings". These should be the possible values.
I like the solution and particularly the simplicity. Of course, one should be interested in *any* change in the RAID status and any change should be investigated. I am implementing this in our environment.
Thanks a lot,
Todor
Reply With Quote
  #6  
Old 20-11-2012, 00:17
geek74 geek74 is offline
Junior Member
 
Join Date: Aug 2012
Posts: 9
Default

Hi,

To monitor the hardware of my dell servers under Ubuntu I am using OMSA to populate snmp with all the hardware info. So no need sudo access with the zabbix agent, no need to add user parameter in agent config

If you want I can share my Dell R720 snmp template Tomorrow.

Cheers.
Reply With Quote
  #7  
Old 20-11-2012, 02:09
tttodorov tttodorov is offline
Junior Member
 
Join Date: Oct 2012
Location: Sofia, Bulgaria
Posts: 5
Send a message via Skype™ to tttodorov
Default

geek74, please share. We will be thankful with one more solution.
Reply With Quote
  #8  
Old 20-11-2012, 12:46
geek74 geek74 is offline
Junior Member
 
Join Date: Aug 2012
Posts: 9
Default

Hi,

So When you have OMSA populating snmp you can use the attached template.
To make it work under ubuntu 12.04LTS install OMSA from dell repository and do the following fix http://administratosphere.wordpress....-ubuntu-a-fix/

It needs a lot of value mapping to be human readable.

DellArrayDiskState
1 ⇒ ready
2 ⇒ failed
3 ⇒ online
4 ⇒ offline
6 ⇒ degraded
7 ⇒ recovering
11 ⇒ removed
13 ⇒ non-raid
15 ⇒ resynching
24 ⇒ rebuild
25 ⇒ noMedia
26 ⇒ formatting
28 ⇒ diagnostics
34 ⇒ predictiveFailure
35 ⇒ initializing
39 ⇒ foreign
40 ⇒ clear
41 ⇒ unsupported
53 ⇒ incompatible


DellBatteryState
1 ⇒ ready
2 ⇒ failed
6 ⇒ degraded
7 ⇒ reconditioning
9 ⇒ high
10 ⇒ low
12 ⇒ charging
21 ⇒ missing
36 ⇒ learning

DellLogDriveState
1 ⇒ ready
2 ⇒ failed
3 ⇒ online
4 ⇒ offline
6 ⇒ degraded
7 ⇒ verifying
15 ⇒ resynching
16 ⇒ regenerating
18 ⇒ failedRedundancy
24 ⇒ rebuilding
26 ⇒ formatting
32 ⇒ reconstructing
35 ⇒ initializing
36 ⇒ backgroundInit
52 ⇒ permanentlyDegraded

DellLogDriveType
1 ⇒ concatenated
2 ⇒ raid-0
3 ⇒ raid-1
4 ⇒ raid-2
5 ⇒ raid-3
6 ⇒ raid-4
7 ⇒ raid-5
8 ⇒ raid-6
9 ⇒ raid-7
10 ⇒ raid-10
11 ⇒ raid-30
12 ⇒ raid-50
13 ⇒ addSpares
14 ⇒ deleteLogical
15 ⇒ transformLogical
18 ⇒ raid-0-plus-1
19 ⇒ concatRaid-1
20 ⇒ concatRaid-5
21 ⇒ noRaid
22 ⇒ volume
23 ⇒ raidMorph
24 ⇒ raid-60
25 ⇒ cacheCade

Dell Open Manage System Status
1 ⇒ Other
2 ⇒ Unknown
3 ⇒ OK
4 ⇒ NonCritical
5 ⇒ Critical
6 ⇒ NonRecoverable

DellsDiskControllerState
1 ⇒ ready
2 ⇒ failed
3 ⇒ online
4 ⇒ offline
6 ⇒ degraded

DellStatus
1 ⇒ other
2 ⇒ unknown
3 ⇒ ok
4 ⇒ nonCritical
5 ⇒ critical
6 ⇒ nonRecoverable

DellStatusProbe
1 ⇒ other
2 ⇒ unknown
3 ⇒ ok
4 ⇒ nonCriticalUpper
5 ⇒ criticalUpper
6 ⇒ nonRecoverableUpper
7 ⇒ nonCriticalLower
8 ⇒ criticalLower
9 ⇒ nonRecoverableLower
10 ⇒ failed

DellStatusRedundancy
1 ⇒ other
2 ⇒ unknown
3 ⇒ full
4 ⇒ degraded
5 ⇒ lost
6 ⇒ notRedundant
7 ⇒ redundnacyOffline

DellStorageGlobalStatus
1 ⇒ critical
2 ⇒ warning
3 ⇒ normal
4 ⇒ unknown


Please comment and update if you found wrong stuff.

Cheers
Attached Files
File Type: xml Template_SNMP_Dell.xml (61.6 KB, 838 views)
Reply With Quote
  #9  
Old 03-07-2013, 02:07
linuxsquad linuxsquad is offline
Junior Member
 
Join Date: Jul 2013
Location: chicago il
Posts: 12
Send a message via Skype™ to linuxsquad
Default template with discovery

First, thanks for this nice contribution to the community. I've used your template on our Fujitsu servers with LSI Megaraid.

A suggestion for you. There is a script that generates an XML template. How about converting it for auto-discovery. It is really easy since you have most of the stuff in place already.

BTW, I am building MD RAID (software ) template with auto-discovery and borrowed couple ideas from you.

Thanks again

OB
Reply With Quote
  #10  
Old 12-07-2013, 21:35
vic vic is offline
Member
 
Join Date: Jul 2013
Posts: 48
Default

Not related to zabbix but perhaps this can be adapted. As is these instructions show how to get MegaCLI to send an email if the RAID array is degraded.

Configure LSI MegaRAID email alerts
Code:
cd /etc/cron.hourly
Once you are in the folder, user your favorite editor to create a new file called MegaRAIDcron. For the purpose of this guide, we are going to use nano.
Code:
nano MegaRAIDcron
In this file, we are going to place the following. Be sure to replace with the email address that the alerts will be sent to.
Code:
#!/bin/bash
cd /opt/MegaRAID/MegaCli
./MegaCli64 -AdpAllInfo -aALL | grep "Degraded" > degraded.txt
./MegaCli64 -AdpAllInfo -aALL | grep "Failed" >> degraded.txt
cat degraded.txt | grep "1" > /dev/null
if [[ $? -eq 0 ]];
then
cat degraded.txt | mailx -s 'Degraded RAID on '$HOSTNAME <REPLACE WITH EMAIL>
fi
Save the changes to the file. Once the changes are made to the file, we need to assign execute permissions to the file.
Code:
chmod +x MegaRAIDcron
To test cron, we need to make one small change to the file. Change the following:
From
Code:
cat degraded.txt | grep "1" > /dev/null
To
Code:
cat degraded.txt | grep "0" > /dev/null
Save the changes and run the cron manually:
Code:
/etc/cron.hourly/MegaRAIDcron
If you have installed everything correctly, you should receive an email which shows the following:

Degraded : 0
Security Key Failed : No
Failed Disks : 0
Deny Force Failed : No

To change the cron from testing back to production use, change the 0 back to a 1 and you are set.
Again, the cron job will only send you an email if the array is degraded or a disk has failed. No news is good news.

Last edited by vic; 13-07-2013 at 01:47.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 04:42.