Ad Widget

Collapse

Aggregate checks question

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Sean35
    Junior Member
    • Jan 2015
    • 19

    #1

    Aggregate checks question

    Hi all,

    I have a bit of a strange requirement which I'm looking for some advice on.

    I have a large number of HP servers, initially about 200 but this could grow to as many as 2,000 in the future, where I need to be able to detect and report on the power state of the servers.

    I have a fair amount of scripting which allows me to pull back various server metrics via IPMI (iLO), here's a small one for just the power state:

    Code:
    #!/bin/bash
    
    # Setup some variables
    USER=user
    PASS=password
    IPMIPATH=/usr/sbin/ipmi-chassis
    SCRIPTTIMEOUT=5
    
    # Get the power state
    
    IPMIRESULT=$(timeout $SCRIPTTIMEOUT $IPMIPATH -D LAN2_0 -h $1 -u $USER -p $PASS -l USER -W discretereading --get-status 2>/dev/null)
    
    if [ -z "$IPMIRESULT" ]
    	then
    		# Set a value of 2 if no result is received
    		echo 2
    	else
    		# Strip out just the power state line
    		SYSTEMPOWER=$(echo "$IPMIRESULT" | grep "System Power")
    		if [ -z "$SYSTEMPOWER" ]
    			then
    				# Set a value of 2 if the System Power status is not returned
    				echo 2
    			else
    				# Strip out just the power state
    				SYSTEMPOWERSTATE=$(echo "$SYSTEMPOWER" | awk '{ print $4 }')	
    				if [ "$SYSTEMPOWERSTATE" = "on" ]
    					then
    						# Set a value of 1 for powered on
    						echo 1
    					else
    						# Set a value of 0 for powered off
    						echo 0
    				fi
    		fi
    fi

    It delivers the following:
    • 0 - powered off
    • 1 - powered on
    • 2 - error (no result, timeout (5s) or incorrect values)


    This is added as a external check on each monitored server.

    My struggle is in producing a count of servers in an off state, one state and error state.

    In aggregate checks the group function of count doesn't seem to exist. Additionally, there doesn't seem to be a way of filtering the items in the aggregate key based on their values.

    I'm also wary that as this scales, the number of external scripts running may become an issue.

    Hoping someone out there has some advice for me please?

    Thanks,
    Sean
  • glebs.ivanovskis
    Senior Member
    • Jul 2015
    • 237

    #2
    You can achieve this by creating three calculated items on the host (basically, "is status 0?", "is status 1?", "is status 2?") all returning either 0 or 1 and then aggregating them.

    Comment

    • Sean35
      Junior Member
      • Jan 2015
      • 19

      #3
      Hi Glebs,

      Thanks for the response.

      I'm not sure how that would (easily) scale to hundreds of servers?

      Comment

      Working...