Ad Widget

Collapse

Host processes monitoring-graphs LLD

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mmarkwitzz
    Senior Member
    • Jan 2011
    • 106

    #1

    Host processes monitoring-graphs LLD

    This solution is based on zabbix 2.0 api and LLD which is only available in zabbix 2.0

    The template included uses active checks. Your zabbix agent must be configured for active checks otherwise the server won't receive any data..

    Description

    A long lost dream of many admins was to have a sort of task manager in zabbix, monitoring cpu load, memory and even bytes read/write per second, all per process. Identifying resource hogs and memory leaks without having to remote desktop to the server. All this is now possible with zabbix 2.0 and this solution.

    In a real world scenario you end up with these types of graphs:





    You also get monitoring for eventlog entries generated by application hangs.

    Usage

    Download the archive, extract the XML template, import it into the frontend and link the hosts to it.

    Copy the vbs scripts to your zabbix agent folder and configure the following lines in your zabbix_agentd.conf file, replacing the path in the UserParameter with your zabbix agent installation path, restart the agent.

    Code:
    #system discovery
    UserParameter = system.discovery[*],cscript "C:\Program Files\Zabbix agent\zabbix_win_system_discovery.vbs" //Nologo "$1"
    #process
    UserParameter = process[*],cscript "C:\Program Files\Zabbix agent\zabbix_win_process.vbs" //Nologo "$1" "$2"
    #eventlog query
    UserParameter = eventlog.query[*],cscript "C:\Program Files\Zabbix agent\zabbix_win_eventlog.vbs" //Nologo "$1" "$2"
    # allow weird chars in userparameters arguments
    UnsafeUserParameters=1
    Download the perl scripts to you zabbix server, edit them, specifically the parameters below, and cron them (I personally user cron.hourly).

    Code:
    	$user = "Admin"; ### username
    	$password = "zabbix"; ### password	
            $url = "http://127.0.0.1/api_jsonrpc.php"; ### intenal zabbix url
    The scripts connect using $user and $password via json api (so the user needs API access, check that in the frontend) and creates the following graphs for all hosts linked to "WIN Processes" template.
    For a detailed explanation of how the scripts work you can view the following post: http://www.zabbix.com/forum/showthread.php?t=26678
    • WIN Process "ALL" bytes/sec stack
    • WIN Process "ALL" handles stack
    • WIN Process "ALL" memory bytes committed stack
    • WIN Process "ALL" memory bytes kernel-nonpaged stack
    • WIN Process "ALL" memory bytes kernel-paged stack
    • WIN Process "ALL" pages/sec fault stack
    • WIN Process "ALL" processor time stack
    • WIN Process "ALL" threads stack


    Gotchas

    The discovery rule has the filter configured so that only processes larger than 25MB are discovered. This is because the perl scripts only support a maximum of 26 processes. If this limit is breached then the graphs will not be created. You can edit the filter to match you own needs using any of the following values OR-ed together with "|"
    • 10$ : processes with memory size 0-10 MB
    • 25$ : processes with memory size 10-25MB
    • 50$ : processes with memory size 25-50MB
    • 100$ : processes with memory size 50-100MB
    • 250$ : processes with memory size 100-250MB
    • 500$ : processes with memory size 250-500MB
    • 1000$ : processes with memory size 500-1GB
    • Higher : processes with memory size > 1GB


    Processes with the same executable name are always merged into one and all values like memory, cpu and read/writes per second are summed up. For example you only get one entry for all you svchost.exe processes and they only count as one to the 26 processes limit.

    All items except WIN Process memory bytes committed may return ZBX_NOTSUPPORTED on first poll.
    This is because the script is designed to query values for all items of a process and save them to a file when WIN Process memory bytes committed is polled. Then when other items are polled, the script simply reads the last value from the file.

    This why WIN Process memory bytes committed is polled every 600 seconds and all other items are polled every 610 seconds, so when they are polled 10 seconds later, the data is already in the file. This of course, does not work after restarting the agent because zabbix just polls _all_ items regardless of their intervals.

    If you are using any other of my templates, make sure your hosts only have one active instance of the WIN Eventlog OOP item. If there is more than one, disable the extra items.
    The eventlog triggers send an alert containing information on the most recent events logged. For these to show up in the alert make sure you include the following macros in the alert message:

    Eventlog info: {ITEM.LASTVALUE}

    If you have any issues importing this template, please upgrade to zabbix 2.0.2 (not released at the date of this post) and php 5.3 or higher.

    Update 1

    Locales with a floating point decimal separator other than dot (.) are now properly supported.

    Update 2

    I can confirm that importing templates with Discovery Rules is broken in zabbix 2.0.1 and will be fixed in zabbix 2.0.2.
    A partial fix for this, until zabbix 2.0.2 is released, can be achieved by SSH to your zabbix server and making the following changes to these files:
    • Edit /var/www/html/include/classes/import/formatters/C20ImportFormatter.php
    • Search for getDiscoveryRules function
    • Modify the line:
    • if (!empty[$host['discovery_rules'])) {
    • with:
    • if (!empty[$template['discovery_rules'])) {


    This will fix importing Discovery rules, item prototypes and trigger prototypes. Graph prototypes importing will still be broken, as they will get imported directly into the template, not in the discovery rules.

    Update 3

    Added eventlog monitoring for application hangs in the template. Download and import the template again, copy the 2 new vbs scripts to the zabbix agent installation path and add the following to the conf file:

    Code:
    #eventlog query
    UserParameter = eventlog.query[*],cscript "C:\Program Files\Zabbix agent\zabbix_win_eventlog.vbs" //Nologo "$1" "$2"
    Update 4

    Decreased eventlog query time to 300s.
    Attached Files
    Last edited by mmarkwitzz; 11-07-2012, 09:08.
  • extress
    Member
    • Jul 2012
    • 32

    #2
    Hello,

    Thanks !! That's a very nice feature However, I wasn't able to get it to run on my zabbix 2 server (Windows 2003 Standard Edition) :

    Active checks is enabled on my windows agent, the .pl script seem to return good value :

    Code:
    HOSTGROUP: Linux servers (2)
    HOSTGROUP: MY-Infra (6)
        HOST: test-host (10087)
                Graph found: WIN Process "ALL" threads stack (683)
                Graph deleted.
            ITEM: WIN Process "AcroRd32.exe" threads (23884)
            ITEM: WIN Process "csrss.exe" threads (23887)
            ITEM: WIN Process "EXCEL.exe" threads (23882)
            ITEM: WIN Process "explorer.exe" threads (23876)
            ITEM: WIN Process "fppdis2a.exe" threads (23883)
            ITEM: WIN Process "mmc.exe" threads (23885)
            ITEM: WIN Process "OUTLOOK.exe" threads (23877)
            ITEM: WIN Process "rdpclip.exe" threads (23888)
            ITEM: WIN Process "spoolsv.exe" threads (23886)
            ITEM: WIN Process "svchost.exe" threads (23880)
            ITEM: WIN Process "winlogon.exe" threads (23881)
            ITEM: WIN Process "WINWORD.exe" threads (23879)
            ITEM: WIN Process "X3.exe" threads (23878)
    From the frontend (attached file), I can select one of the discovered graph, but there is no data except the list of the running process ?!
    Attached Files
    Last edited by extress; 02-07-2012, 16:38.

    Comment

    • mmarkwitzz
      Senior Member
      • Jan 2011
      • 106

      #3
      @extress

      Give it 10 minutes or restart the agent. The script calculates averages between the previous values (if any) and the current values.

      On first poll it only saves the current values and does not return nothing. From second poll forwards it calculates the average and returns them. Either that or there is a bug in my scripts.

      Go to latest data and check if you have any values for "WIN Process "<process>" memory bytes committed" items. If you do, then there's no problem. If you don't then there is probably a problem with the script and I will need more info to debug it.

      Please keep me posted
      Last edited by mmarkwitzz; 02-07-2012, 17:08.

      Comment

      • extress
        Member
        • Jul 2012
        • 32

        #4
        Thanks for your answer !

        I've waited more than 1 hour and still nothing, I've also tried to restart the agent, same thing.

        edit : from latest data, all I have is "WIN Process "AcroRd32.exe" memory bytes committed Never"

        I have "never" for each items on the table ^^"

        edit 2: Also tried this on my windows 7 x64 but .pl script doesn't return any process.
        Last edited by extress; 02-07-2012, 17:16.

        Comment

        • mmarkwitzz
          Senior Member
          • Jan 2011
          • 106

          #5
          Originally posted by extress
          Thanks for your answer !

          I've waited more than 1 hour and still nothing, I've also tried to restart the agent, same thing.
          Do you have any values at all in latest data (for this template, obviously) ?
          Look in the agent installation folder for a file named "zabbix_win_process.saved". Does it have any values?
          Try running "cscript zabbix_win_process.vbs commit svchost.exe". Does it return any values? Does it throw an error? Does the "zabbix_win_process.saved" file get populated?
          Try running "cscript zabbix_win_process.vbs read svchost.exe". Does it return any value?

          Comment

          • extress
            Member
            • Jul 2012
            • 32

            #6
            lol you missed my last edit.

            So the file "zabbix_win_process.saved" is well populated :

            Code:
            croRd32.exe|COMMIT|02/07/2012 17:17:32|71688192,0000
            AcroRd32.exe|HANDLES|02/07/2012 17:17:32|1118,0000
            AcroRd32.exe|NONPAGEDPOOL|02/07/2012 17:17:32|30176,0000
            AcroRd32.exe|PAGEDPOOL|02/07/2012 17:17:32|579000,0000
            AcroRd32.exe|THREADS|02/07/2012 17:17:32|48,0000
            AcroRd32.exe|READ|02/07/2012 17:17:32|7580,8771|38318753|4852759742015759|
            AcroRd32.exe|WRITE|02/07/2012 17:17:32|16,4287|14235199|4852759742015759|
            AcroRd32.exe|PAGEFAULTS|02/07/2012 17:17:32|401,6334|873306|4852759742015759|
            AcroRd32.exe|CPU|02/07/2012 17:17:32|16,3317|3425312500|18446740667937075616|
            As for the vbs command line:

            Code:
            C:\Program Files\Zabbix Agent>cscript zabbix_win_process.vbs commit svchost.exe
            Microsoft (R) Windows Script Host Version 5.6
            Copyright (C) Microsoft Corporation 1996-2001. All rights reserved.
            
            175198208,0000
            When I go to one of my host, I can see that almost all items are disabled with this kind of error (something related to the dot maybe ?):
            Attached Files
            Last edited by extress; 03-07-2012, 08:42.

            Comment

            • mmarkwitzz
              Senior Member
              • Jan 2011
              • 106

              #7
              Change your locale to have decimal sepparated by . (Point)
              If that is not an option for you than have some pacience until tomorrow when i'll modify the script to automatically replace commas with points

              Comment

              • extress
                Member
                • Jul 2012
                • 32

                #8
                Okay, I can wait as for my x64 system, the vbs doesn't work at all, and I can't get any process from the .pl script, any idea ? (maybe this fucking damn UAC)

                Comment

                • mmarkwitzz
                  Senior Member
                  • Jan 2011
                  • 106

                  #9
                  Originally posted by extress
                  Okay, I can wait as for my x64 system, the vbs doesn't work at all, and I can't get any process from the .pl script, any idea ? (maybe this fucking damn UAC)
                  It's not that, i tested this on 2008, 2003, x86 and x64. Discovery is not workinf for some reason. Is zabbix agent service running as local system or an admin account?

                  Start a cmd with run as administrator. Cd to zabbix agent install path and run "cscript zabbix_win_system_discovery.vbs processes" (check the script name, i dont emember exqctly). Does it return a json of all discovered processes?

                  If yes, check if active checks are configured correctly

                  Comment

                  • extress
                    Member
                    • Jul 2012
                    • 32

                    #10
                    The zabbix service is run as SYSTEM. When I run cmd with my admin rights, after some seconds, this is what I get :

                    Code:
                    {
                            "{#PRNAME}":"ActualMultipleMonitorsCenter.exe",
                            "{#PRSIZE}":"10"
                    },

                    Comment

                    • mmarkwitzz
                      Senior Member
                      • Jan 2011
                      • 106

                      #11
                      Is this _exactly_ what you get or is it an extract from a multiple page string?

                      Can you count exactly how many seconds? If its more than 10 try increasing time timeout value in zabbix_agentd.conf with "Timeout=30"

                      Edit: Uploaded a new version of the script that should return dot separated decimals. Would you test it please and let me know if it works?

                      Edit: Re-uploaded the script. Previous one had a datetime glitch.
                      Last edited by mmarkwitzz; 03-07-2012, 07:44.

                      Comment

                      • extress
                        Member
                        • Jul 2012
                        • 32

                        #12
                        It seems that your upload failed because I get the same file as yesterday (same date and same size) so I can't test it... ^__^"

                        Ok so, I've removed everything to start all over (removed and cleared the template from my host, removed the template, removed all the .pl file etc...), but now when I execute my crontab, all I have is (for each script) :

                        Code:
                        HOSTGROUP: Linux servers (2)
                        HOSTGROUP: MY-Infra (6)
                            HOST: test-host (10087)
                                    Graph created: WIN Process "ALL" threads stack
                        HOSTGROUP: Zabbix servers (4)
                        No more process list.

                        and the file on the windows agent side is not created anymore (but when I run your previous command "cscript zabbix_win_process.vbs commit svchost.exe". zabbix_win_process.saved file is created and populated but I still can't get the list of the process) Tried to revert to the previous vbs, same problem.

                        edit2: for an unknown reason, when I change the debug level to 4, script are able to retrieve the process list, and the zabbix_win_process.saved is created and populated oO
                        Last edited by extress; 03-07-2012, 11:01.

                        Comment

                        • mmarkwitzz
                          Senior Member
                          • Jan 2011
                          • 106

                          #13
                          Check in your host items for discovered items. If there are none, try restarting the agent and waiting a couple of minutes.

                          Also try running "cscript zabbix_win_system_discovery.vbs processes" and see if it returns the json of processes. If it does, then discovery works but zabbix server didnt receive the items yet.

                          Comment

                          • extress
                            Member
                            • Jul 2012
                            • 32

                            #14
                            Okay, as from now :

                            I do have a lot of discovered items in my host
                            Latest data tab show "never" for all of them
                            Your command is working as intended.

                            But I still don't understand why, when I changed the debug level and restarted the agent service, the .pl script started to work ?! I don't really know if it still try to gather the data

                            Edit 2: the zabbix_win_process.saved has been deleted by the agent ?! So I changed the debug level back and the file has been recreated.

                            Edit 3: I can see some graph (only one value since the zabbix_win_process.saved is not updated as it sould be) except for : bytes/s strack, all value are 0 bytes/s; page/sec fault; processor time.
                            Last edited by extress; 03-07-2012, 11:42.

                            Comment

                            • mmarkwitzz
                              Senior Member
                              • Jan 2011
                              • 106

                              #15
                              The .pl simply takes the discovered items and adds them to a graph. If it does not work its only because there are no discovered items (yet).

                              Setting the debuglevel had nothing to do with it, restarting the agent triggered a new discovery and new items were created.

                              Look in the host items screen again and see if they are still disabled and if so, with what error? Still the comma/dot thing?

                              Comment

                              Working...