Ad Widget

Collapse

Zabbix and Health Checks of Server Hardware

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • gglea
    Junior Member
    • Mar 2015
    • 1

    #1

    Zabbix and Health Checks of Server Hardware

    How (if possible) would zabbix handle health checks of server hardware such as cpu, raid volumes, power supply etc without using SNMP traps? Is there a way for zabbix to pull these checks from the devices using the zabbix agent in order to for example show predictability of a hard disk failing? Is there a way it can work with Dell Openmanage to pull this information? Any info regarding this would be appreciated.


    Thanks
  • HaveDill
    Senior Member
    • Sep 2014
    • 103

    #2
    For my HP servers i found a nagios vbs script that runs and reports back RAID status' using HPAcucli


    So in each of my C:\Zabbix folder i put in the .vbs, then in the agentd.conf i placed the custom key i want for this script:

    UserParameter = smartarray,cscript "C:\Zabbix\check_smartarray.vbs"

    Then in my Zabbix host i have an item that points to that key, and checks once a day on my RAID status'. Triggers check for regular expressions in the returned value.


    Part of my powershell rollout script also made shure that HPACUCLI existed on the remote machine (we are an HP house), and if not copied the proper files over so the vbs would run successfully.

    Comment

    • aib
      Senior Member
      • Jan 2014
      • 1615

      #3
      There are my five cents about RAID control:
      zabbix_agentd.conf
      Code:
      UserParameter=raid.ad,sh /usr/lib/zabbix/externalscripts/aacraid.active_drives
      UserParameter=raid.cs,sh /usr/lib/zabbix/externalscripts/aacraid.controller_status
      UserParameter=raid.ld,sh /usr/lib/zabbix/externalscripts/aacraid.LD_status $1
      Code:
      # cat aacraid.active_drives
      #!/bin/sh
      sudo /usr/local/sbin/arcconf getconfig 1 PD | grep "State" | grep "Online"| wc -l
      Code:
      # cat aacraid.controller_status
      #!/bin/sh
      sudo /usr/local/sbin/arcconf getconfig 1 AD  | grep "Controller Status" | grep -vc Optimal
      Code:
      # cat aacraid.LD_status
      #!/bin/sh
      sudo /usr/local/sbin/arcconf getconfig 1 LD $1 | grep device | grep -v information
      Code:
      # arcconf getconfig 1 AD
      Controllers found: 1
      ----------------------------------------------------------------------
      Controller information
      ----------------------------------------------------------------------
         Controller Status                        : Optimal
         Channel description                      : SATA
         Controller Model                         : Adaptec 2810SA
         Controller Serial Number                 : C80489
         Physical Slot                            : 6
         Installed memory                         : 64 MB
         Copyback                                 : Disabled
         Background consistency check             : Enabled
         Automatic Failover                       : Enabled
         Stayawake period                         : Disabled
         Spinup limit internal drives             : 0
         Spinup limit external drives             : 0
         Defunct disk drive count                 : 0
         Logical devices/Failed/Degraded          : 3/0/0
         --------------------------------------------------------
         Controller Version Information
         --------------------------------------------------------
         BIOS                                     : 4.2-0 (8205)
         Firmware                                 : 4.2-0 (8205)
         Driver                                   : 1.2-0 (30300)
         Boot Flash                               : 0.0-0 (0)
         --------------------------------------------------------
         Controller Battery Information
         --------------------------------------------------------
         Status                                   : Not Installed
      
      
      Command completed successfully.
      Sincerely yours,
      Aleksey

      Comment

      • aib
        Senior Member
        • Jan 2014
        • 1615

        #4
        and one more - about CPU temperature:
        zabbix_agentd.conf:
        Code:
        UserParameter=cpu.temp[*],/usr/lib/zabbix/externalscripts/cpu_t $1
        Code:
        [B]# cat /usr/lib/zabbix/externalscripts/cpu_t[/B]
        #!/bin/bash
        
        echo $(date '+%Y%m%d:%H%M%S') $0 $1 =\> `/usr/bin/sensors | grep temp` >> /var/log/zabbix/cpu_t.log
        
        v=$(/usr/bin/sensors | awk -v k=$1 ' /'temp'/ {
        if (k == "current")
        { split($2,a,".")
        split(a[1],b,"+")
        }
        else
        {
        if (k == "high")
        { split($5,a,".")
        split(a[1],b,"+")
        }
        }
        }
        END {print b[2]}')
        echo $(date '+%Y%m%d:%H%M%S') $0 $1 =\> $v >> /var/log/zabbix/cpu_t.log
        echo $v
        Code:
        [B]# /usr/bin/sensors[/B]
        w83627hf-isa-0290
        Adapter: ISA adapter
        in0:         +1.46 V  (min =  +1.25 V, max =  +1.86 V)
        in1:         +3.33 V  (min =  +2.80 V, max =  +3.79 V)
        in2:         +3.31 V  (min =  +2.80 V, max =  +3.79 V)
        in3:         +2.93 V  (min =  +2.53 V, max =  +3.42 V)
        in4:         +3.15 V  (min =  +2.67 V, max =  +3.62 V)
        in5:         +0.66 V  (min =  +0.30 V, max =  +0.99 V)
        in6:         +1.78 V  (min =  +1.52 V, max =  +2.06 V)
        in7:         +3.23 V  (min =  +0.54 V, max =  +3.23 V)
        in8:         +1.17 V  (min =  +0.05 V, max =  +2.37 V)
        fan1:       5192 RPM  (min = 1424 RPM, div = 4)
        fan2:       5037 RPM  (min = 1424 RPM, div = 4)
        fan3:          0 RPM  (min = 1424 RPM, div = 4)  ALARM
        temp1:       +27.0°C  (high = +65.0°C, hyst = +60.0°C)  sensor = thermistor
        temp2:       +28.0°C  (high = +75.0°C, hyst = +70.0°C)  sensor = thermistor
        temp3:       +28.5°C  (high = +75.0°C, hyst = +70.0°C)  sensor = thermistor
        cpu0_vid:   +1.500 V
        beep_enable:enabled
        Sincerely yours,
        Aleksey

        Comment

        • Raged23
          Junior Member
          • Mar 2015
          • 4

          #5
          Openmange Dell Server - Windows

          I use the same method as HaveDrill in a Windows enviroment.
          This allows you to use an Active Agent behind firewalls.
          With Open Manage installed you can use the omreport command to provide information on the Raid status.
          My setup is very basic, as I don't know enough to parse the information from omreport in a more complex way.
          Zabbix.agend.conf
          Code:
          UserParameter=raid.disks,%systemroot%\system32\windowspowershell\v1.0\powershell.exe -nologo C:\zabbix\zbx_raid_disks.ps1
          zbx_raid_disks.ps1
          Code:
          omreport storage vdisk controller=0 | ?{$_ -match "^status"} | %{$status=1}{if($_ -notlike "*OK*"){$status=0}}{$status}
          So now if your raid status is *Ok* it will return a 1. If a drive fails or any other problem it will return a 0.
          You can then make a template and trigger for raid.disks.
          I use this same method for intrusion,temp, etc.
          I am working on a better way to parse the information to actually return values rather than just OK or Problem.
          This method was taken from a much older zabbix thread.

          Comment

          Working...