Announcement

Collapse
No announcement yet.

[BETA RELEASE] IBM Tivoli Storage Manager

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    [BETA RELEASE] IBM Tivoli Storage Manager

    Hello!
    This is the first release of my TSM monitoring script/template I've created for Zabbix. This has been tested on TSM 5x and zabbix 1.8x. Its been doing what I need for some time now. I finally got around to cleaning it up There will be updates as time goes on, on my end and as bugs/features get reported.


    NOTE: The script_update.zip is NOT required if you're not currently using the product!

    06.21.2012 - 1.0 BETA 2
    FEATURES
    • Currently 28 datapoints collected
    • 9 predefined triggers
    • 7 predefined graphs
    • Automated tape consolidation

    TODO
    • Clean up script (theres some nasty 'sed' and 'awk' going on in there )
    • Remove dependency on internal TSM scripts
    • Get alerts for failed backups (right now the alerts are generated and sent to a ticketing system, zabbix is unaware of which hosts had failed backups, just that there WERE failed backups, and how many, unless you utilize the ticketing system feature, or add your own jazz into the while loop for failed backups)

    Known Issues
    • A few of the functions are built upon a builtin TSM script(included in download). This dependency will hopefully be removed in the next release.

    UPDATES:
    • (02/28/2012) - I've found an issue in the no data received trigger, I've updated the download. I've also provided the updated trigger value.
      {Template_Tivoli_Storage_Manager:tsm.nodes.count.n odata(3600)}=1
    • (04/20/2012) - I found a coding mistake in the script, namely I was writing both failed, and missed backup count to the same variable. the key and item already exists in your template, so simply replacing the .sh file will suffice, or if you're comfortable modify the "tsm_failedjobs" function changing "send_value tsm.jobs.missed' to ".failed"
    • (06/21/2012) - ToomasAas found some logic issues and has fixed them, the original download has been updated.

    Feedback welcome, my contact info is in the script, as are the instructions for installation/use.
    Attached Files
    Last edited by parabola; 21-06-2012, 16:49.

    #2
    Here are some screenshots:




    Comment


      #3
      Hi parabola,

      My TSM admin skills are a bit rusty (it's been years) - so I'm working with our resident TSM admin to get things going. As our config is quite a bit different (we're on 6.3, have a library manager, etc) I'm going to start adding items one-by-one.

      I decided to start with the diskpools. How long are your volume_names? When I query mine, it's full filepath - 45 characters in one case. So the command:

      send_value tsm.pools."${disk-5)}" "$num"
      (ignore the unhappy face, it's getting parsed by the forum!)
      ends up with an itemname that isn't "tsm.pools.disk1", and instead is a bit garbled.

      Perhaps the length is irrelevant, as I think I'm going to have to manually create the items to accurately reflect the disk names. I'm wondering if there should be some sort of "profiling" script that builds the template with the specific diskpools after querying the TSM server. It's something I'll ponder..

      Comment


        #4
        Originally posted by dougbee View Post
        Hi parabola,

        My TSM admin skills are a bit rusty (it's been years) - so I'm working with our resident TSM admin to get things going. As our config is quite a bit different (we're on 6.3, have a library manager, etc) I'm going to start adding items one-by-one.

        I decided to start with the diskpools. How long are your volume_names? When I query mine, it's full filepath - 45 characters in one case. So the command:

        send_value tsm.pools."${disk-5)}" "$num"
        (ignore the unhappy face, it's getting parsed by the forum!)
        ends up with an itemname that isn't "tsm.pools.disk1", and instead is a bit garbled.

        Perhaps the length is irrelevant, as I think I'm going to have to manually create the items to accurately reflect the disk names. I'm wondering if there should be some sort of "profiling" script that builds the template with the specific diskpools after querying the TSM server. It's something I'll ponder..
        Doug, my apologies on my late response!
        Regarding the disk names, yes you are correct, that will need to be modified, heres one of mine "/dev/rtsmlvdisk1" so the -5 grabs just "disk1" . So the length will need to be changed depending on how much of the name you want it to grab. I'm working on a release of this that will hopefully be more "out of the box" friendly, with less configuration needed, however I'm running into some obvious issues with that. (every environment is radically different)

        if you need help with any of the scripting stuff send me a message on the forums, or in IRC (same nick)

        Comment


          #5
          No problem, in fact your lack of response had me go out on my own!

          I found that you can use a "-dataonly=yes" parameter for dsmadmc, which simplifies the scripting quite a bit, and should leave it more version-independent...

          Comment


            #6
            Originally posted by dougbee View Post
            No problem, in fact your lack of response had me go out on my own!

            I found that you can use a "-dataonly=yes" parameter for dsmadmc, which simplifies the scripting quite a bit, and should leave it more version-independent...
            Nice find! I will play with that and see what I can do with it, if you make any good changes to it, please feel free to update and attach or send to me and i'll update the original post with your updated version (with credits!)

            Comment


              #7
              Help

              Can you guys please help with a brieve description on how to impliment this please.

              I have a Zabbix server, with a Zabbix proxy in the client environment and then a TSM server that I want to monitor.

              Please Please help
              4 Zabbix Frontend Servers (Load balanced)
              2 Zabbix App Servers (HA)
              2 Zabbix Database Servers (HA)
              18 Zabbix Proxy Servers (HA)
              3897 Deployed Zabbix Agents
              6161 Values per second
              X-Layer Integration
              Jaspersoft report Servers (HA)

              Comment


                #8
                Originally posted by vlam View Post
                Can you guys please help with a brieve description on how to impliment this please.

                I have a Zabbix server, with a Zabbix proxy in the client environment and then a TSM server that I want to monitor.

                Please Please help
                ylam,
                The script can be placed anywhere, and will need to be scheduled with cron. The script uses the TSM client to run SQL queries against TSM, the returned data is then sent via the zabbix_sender binary.

                Due to the complex nature of a backup environment, this is not a simple drop in solution, a few things will need to be modified, (disk pool names, login..ect) but stuff is commented and if you're unsure of any part of the script feel free to ask (you'll get quicker responses on IRC )

                Comment


                  #9
                  Different implementation of tsm_summary_24hrs

                  I'm implementing this solution and so far find it extremely awesome.

                  I am running TSM server 6.2, and noticed that in my case the statistics gathered by tsm_summary_24hrs function are not correct, because results output by the TSM SELECT query are not in the order in which the function expects them to be and wrong values are getting assigned to wrong items. So I re-wrote the function in a way that is hopefully more independent from the version of TSM. Note that this also changes the item names (they are in uppercase as returned by TSM), so item names in the template need to be adjusted accordingly. Some 'interesting' statistics are probably missing because they were not present in my TSM server at the moment, but they are easy enough to add.

                  Code:
                  function tsm_summary_24hrs { 
                          summary=$(tsm_cmd "SELECT activity,cast(float(sum(bytes))/1024/1024/1024 as dec(8,2)) as "GB" FROM summary
                   where end_time>current_timestamp-24 hours GROUP BY activity")
                  
                          for jobtype in ARCHIVE BACKUP INCR_DBBACKUP FULL_DBBACKUP MIGRATION RECLAMATION
                          do
                                  send_value tsm.summary.daily.$jobtype $(echo "$summary" | grep $jobtype | awk {'print $2'})
                          done
                  }

                  Comment


                    #10
                    A thought about tsm_missedjobs and tsm_failedjobs

                    I noticed that functions tsm_missedjobs and tsm_failedjobs send no data if there are no missed/failed jobs. This means, that if there is one missed job one day, the value of 'missed jobs' in TSM is set to 1 and it remains to 1, even though next day there are no failed jobs. Maybe some people prefer this behaviour, but I added a simple "else" statement to the if/endif construction in tsm_missedjobs:

                    Code:
                            if [[ $(echo $missedLog) == *Missed* ]]; then
                                    echo "$missedLog" > tsmPipeMissed &
                                    while read line; do
                                            missedInt=$(($missedInt+1))
                                            tsm_failed_subject="Monitoring - TSM - Missed backup for  host $(echo $line | awk '{print $3}')"
                                            open_ticket "$line"
                                    done < tsmPipeMissed
                                    send_value tsm.jobs.missed "$missedInt"
                                    rm -f tsmPipeMissed
                    +        else
                    +               send_value tsm.jobs.missed 0
                            fi
                    Similarly in tsm_failedjobs, only with another key name.

                    This way, if there are no missed/failed jobs the item gets set to 0 and the trigger is cleared.

                    Comment


                      #11
                      Originally posted by ToomasAas View Post
                      I noticed that functions tsm_missedjobs and tsm_failedjobs send no data if there are no missed/failed jobs. This means, that if there is one missed job one day, the value of 'missed jobs' in TSM is set to 1 and it remains to 1, even though next day there are no failed jobs. Maybe some people prefer this behaviour, but I added a simple "else" statement to the if/endif construction in tsm_missedjobs:

                      Code:
                              if [[ $(echo $missedLog) == *Missed* ]]; then
                                      echo "$missedLog" > tsmPipeMissed &
                                      while read line; do
                                              missedInt=$(($missedInt+1))
                                              tsm_failed_subject="Monitoring - TSM - Missed backup for  host $(echo $line | awk '{print $3}')"
                                              open_ticket "$line"
                                      done < tsmPipeMissed
                                      send_value tsm.jobs.missed "$missedInt"
                                      rm -f tsmPipeMissed
                      +        else
                      +               send_value tsm.jobs.missed 0
                              fi
                      Similarly in tsm_failedjobs, only with another key name.

                      This way, if there are no missed/failed jobs the item gets set to 0 and the trigger is cleared.
                      Thank you very much, I totally overlooked this.. I will update the original zip with your changes.

                      Thanks again

                      Comment


                        #12
                        Hi Parabola

                        Is it possible to have the "failed jobs" and "not started jobs" also create a text entry into Zabbix. The reason why I ask is we have a call logging solution integrated into Zabbix so if there is one of those errors then I would like Zabbix to manage the outcome of it.

                        Thanks
                        4 Zabbix Frontend Servers (Load balanced)
                        2 Zabbix App Servers (HA)
                        2 Zabbix Database Servers (HA)
                        18 Zabbix Proxy Servers (HA)
                        3897 Deployed Zabbix Agents
                        6161 Values per second
                        X-Layer Integration
                        Jaspersoft report Servers (HA)

                        Comment


                          #13
                          TODO done, now on github

                          Originally posted by parabola View Post
                          TODO[LIST][*]Clean up script (theres some nasty 'sed' and 'awk' going on in there )[*]Remove dependency on internal TSM scripts
                          Thanks for your awesome work on the TSM Zabbix template, I hope you
                          don’t mind but I've sorted at least these two and put it on Github.

                          https://github.com/rollercow/tsm_zabbix/

                          --
                          Chris

                          Comment


                            #14
                            Hi, I have implemented the file tsm.sh to retrieve information about the storage pool and the size of the DB.


                            Code inserted in the script (TSM 5.x e TSM 6.x) for Storage Pool
                            PHP Code:
                            #########################
                            #   PERSONALIZZAZIONE   #
                            #########################

                            #-------------------------------------------------------------------
                            # Variabili
                            STGP1=<name.storage.pool.1>
                            STGP2=<name.storage.pool.2>
                            STGP3=<name.storage.pool.3>
                            STGP4=<name.storage.pool.4>

                            #-------------------------------------------------------------------
                            # StoragePool 1

                            # Numero di Volumi utilizzati da StoragePool 1
                            function tsm_stgp_volumes_1 {
                                
                            stgp_volumes_1=$(tsm_cmd "select count(*) as count from volumes where stgpool_name = '$STGP1'")
                                
                            send_value tsm.stgp.volumes.1 "$stgp_volumes_1"
                            }

                            # Spazio (in TB) occupato da dati di Backup su StoragePool 1
                            function tsm_stgp_util_bckp_1 {
                                
                            stgp_util_bckp_1=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Bkup' AND stgpool_name = '$STGP1'")
                                
                            send_value tsm.stgp.util.bckp.1 "$stgp_util_bckp_1"
                            }

                            # Spazio (in TB) occupato da dati di Archive su StoragePool 1
                            function tsm_stgp_util_arch_1 {
                                
                            stgp_util_arch_1=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Arch' AND stgpool_name = '$STGP1'")
                                
                            send_value tsm.stgp.util.arch.1 "$stgp_util_arch_1"
                            }

                            #-------------------------------------------------------------------
                            # StoragePool 2

                            # Numero di Volumi utilizzati da StoragePool 2
                            function tsm_stgp_volumes_2 {
                                
                            stgp_volumes_2=$(tsm_cmd "select count(*) as count from volumes where stgpool_name = '$STGP2'")
                                
                            send_value tsm.stgp.volumes.2 "$stgp_volumes_2"
                            }

                            # Spazio (in TB) occupato da dati di Backup su StoragePool 2
                            function tsm_stgp_util_bckp_2 {
                                
                            stgp_util_bckp_2=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Bkup' AND stgpool_name = '$STGP2'")
                                
                            send_value tsm.stgp.util.bckp.2 "$stgp_util_bckp_2"
                            }

                            # Spazio (in TB) occupato da dati di Archive su StoragePool 2
                            function tsm_stgp_util_arch_2 {
                                
                            stgp_util_arch_2=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Arch' AND stgpool_name = '$STGP2'")
                                
                            send_value tsm.stgp.util.arch.2 "$stgp_util_arch_2"
                            }

                            #-------------------------------------------------------------------
                            # StoragePool 3

                            # Numero di Volumi utilizzati da StoragePool 3
                            function tsm_stgp_volumes_3 {
                                
                            stgp_volumes_3=$(tsm_cmd "select count(*) as count from volumes where stgpool_name = '$STGP3'")
                                
                            send_value tsm.stgp.volumes.3 "$stgp_volumes_3"
                            }

                            # Spazio (in TB) occupato da dati di Backup su StoragePool 3
                            function tsm_stgp_util_bckp_3 {
                                
                            stgp_util_bckp_3=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Bkup' AND stgpool_name = '$STGP3'")
                                
                            send_value tsm.stgp.util.bckp.3 "$stgp_util_bckp_3"
                            }


                            # Spazio (in TB) occupato da dati di Archive su StoragePool 3
                            function tsm_stgp_util_arch_3 {
                                
                            stgp_util_arch_3=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Arch' AND stgpool_name = '$STGP3'")
                                
                            send_value tsm.stgp.util.arch.3 "$stgp_util_arch_3"
                            }

                            #-------------------------------------------------------------------
                            # StoragePool 4

                            # Numero di Volumi utilizzati da StoragePool 4
                            function tsm_stgp_volumes_4 {
                                
                            stgp_volumes_4=$(tsm_cmd "select count(*) as count from volumes where stgpool_name = '$STGP4'")
                                
                            send_value tsm.stgp.volumes.4 "$stgp_volumes_4"
                                
                            }

                            # Spazio (in TB) occupato da dati di Backup su StoragePool 4
                            function tsm_stgp_util_bckp_4 {
                                
                            stgp_util_bckp_4=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Bkup' AND stgpool_name = '$STGP4'")
                                
                            send_value tsm.stgp.util.bckp.4 "$stgp_util_bckp_4"
                            }

                            # Spazio (in TB) occupato da dati di Archive su StoragePool 4
                            function tsm_stgp_util_arch_4 {
                                
                            stgp_util_arch_4=$(tsm_cmd "select CAST(SUM(logical_mb)/1024/1024 AS DEC(8,2)) as TB FROM occupancy WHERE type = 'Arch' AND stgpool_name = '$STGP4'")
                                
                            send_value tsm.stgp.util.arch.4 "$stgp_util_arch_4"
                            }

                            #------------------------------------------------------------------- 
                            Code insert in script for Database Size (TSM 5.x)
                            PHP Code:
                            # Database TSM

                            # Spazio totale occupato da DB in byte
                            function tsm_db_totalsize {
                                
                            db_totalsize=$(tsm_cmd "select (AVAIL_SPACE_MB * 1024) from DB")
                                
                            send_value tsm.db.totalsize.old "$db_totalsize"
                            }

                            # Spazio totale utilizzato da DB
                            function tsm_db_usedsize {
                                
                            db_usedsize=$(tsm_cmd "select (CAPACITY_MB * 1024) from DB")
                                
                            send_value tsm.db.usedsize.old "$db_usedsize"
                            }

                            # Spazio disponibile su DB
                            function tsm_db_freesize {
                                
                            db_freesize=$(tsm_cmd "select (MAX_EXTENSION_MB * 1024) from DB")
                                
                            send_value tsm.db.freesize.old "$db_freesize"
                            }

                            # Percentuale utilizzo DB
                            function tsm_db_pct_used {
                                
                            db_pct_used=$(tsm_cmd "select PCT_UTILIZED from DB")
                                
                            send_value tsm.db.pct.utilized "$db_pct_used"
                            }

                            #-------------------------------------------------------------------
                            # Database Log TSM

                            # Spazio totale occupato da Log
                            function tsm_log_totalsize {
                                
                            log_totalsize=$(tsm_cmd "select ((AVAIL_SPACE_MB * 1024) * 1024) from LOG")
                                
                            send_value tsm.log.totalsize "$log_totalsize"
                            }

                            # Spazio totale utilizzato da Log
                            function tsm_log_usedsize {
                                
                            log_usedsize=$(tsm_cmd "select ((CAPACITY_MB * 1024) * 1024) from LOG")
                                
                            send_value tsm.log.usedsize "$log_usedsize"
                            }

                            # Spazio disponibile su Log
                            function tsm_log_freesize {
                                
                            log_freesize=$(tsm_cmd "select ((MAX_EXTENSION_MB * 1024) * 1024) from LOG")
                                
                            send_value tsm.log.freesize "$log_freesize"
                            }

                            # Percentuale utilizzo Log
                            function tsm_log_pct_used {
                                
                            log_pct_used=$(tsm_cmd "select PCT_UTILIZED from LOG")
                                
                            send_value tsm.log.pct.used "$log_pct_used"

                            Code insert in script for Database Size (TSM 6.x)
                            PHP Code:
                            #-------------------------------------------------------------------
                            # Database TSM

                            # Spazio totale occupato da DB
                            function tsm_db_totalsize {
                                
                            db_totalsize=$(tsm_cmd "select ((TOT_FILE_SYSTEM_MB * 1024) * 1024)  from DB")
                                
                            send_value tsm.db.totalsize "$db_totalsize"
                            }

                            # Spazio totale utilizzato da DB
                            function tsm_db_usedsize {
                                
                            db_usedsize=$(tsm_cmd "select ((USED_DB_SPACE_MB * 1024) * 1024) from DB")
                                
                            send_value tsm.db.usedsize "$db_usedsize"
                            }

                            # Spazio disponibile su DB
                            function tsm_db_freesize {
                                
                            db_freesize=$(tsm_cmd "select ((FREE_SPACE_MB * 1024) * 1024) from DB")
                                
                            send_value tsm.db.freesize "$db_freesize"
                            }

                            # Percentuale utilizzo DB
                            function tsm_db_pct_used {
                                
                            db_pct_used=$(tsm_cmd "SELECT CAST(SUM(100-(free_space_mb*100) / tot_file_system_mb) AS DECIMAL(3,1)) AS PCT_UTILIZED FROM db")
                                
                            send_value tsm.db.pct.utilized "$db_pct_used"
                            }

                            #-------------------------------------------------------------------
                            # Database Log TSM

                            # Spazio totale occupato da Log
                            function tsm_log_totalsize {
                                
                            log_totalsize=$(tsm_cmd "select ((TOTAL_SPACE_MB * 1024) * 1024) from LOG")
                                
                            send_value tsm.log.totalsize "$log_totalsize"
                            }

                            # Spazio totale utilizzato da Log
                            function tsm_log_usedsize {
                                
                            log_usedsize=$(tsm_cmd "select ((USED_SPACE_MB * 1024) * 1024) from LOG")
                                
                            send_value tsm.log.usedsize "$log_usedsize"
                            }

                            # Spazio disponibile su Log
                            function tsm_log_freesize {
                                
                            log_freesize=$(tsm_cmd "select ((FREE_SPACE_MB * 1024) * 1024) from LOG")
                                
                            send_value tsm.log.freesize "$log_freesize"
                            }

                            # Percentuale utilizzo Log
                            function tsm_log_pct_used {
                                
                            log_pct_used=$(tsm_cmd "SELECT CAST(SUM(100-(free_space_mb*100) / tot_file_system_mb) AS DECIMAL(3,1)) AS PCT_UTILIZED FROM db")
                                
                            send_value tsm.log.pct.used "$log_pct_used"
                            }


                            #------------------------------------------------------------------- 
                            Attachment zabbix's template with the item configured
                            Attached Files
                            Last edited by Simone; 21-05-2015, 12:24.

                            Comment

                            Working...
                            X