Ad Widget

Collapse

Troubleshooting and Optimizing a Custom Monitoring PowerShell Script

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jamess
    Junior Member
    • Aug 2023
    • 1

    #1

    Troubleshooting and Optimizing a Custom Monitoring PowerShell Script

    I'm seeking guidance on troubleshooting and optimizing a custom PowerShell script running on our Windows AWS instances (Zabbix agents installed here) that causes issues during simultaneous server restarts, leading to missing data in Zabbix items.

    After the instances come back online post Windows reboot, the Zabbix custom monitoring PowerShell script runs for prolonged periods and consumes excessive CPU resources.
    It appears that ALL items (see below re: customer.web.client) are spawned at once leading to multiple PowerShell scripts running at the same time. This behavior is only observed after we reboot the monitored server.

    Under normal circumstances, we only see the custom.web.client call come in one at a time.

    I'm curious if there are any tuning options available for the Zabbix Agent, such as forking or running on multiple threads, or any settings in the Zabbix Server that can aid in diagnosing and/or resolving this issue.


    The following is the custom script causing the issue:
    Code:
    Param(
      [string]$uri, # Full URI to be requested
      [string]$stringToFind, # String to look for in the response
      [string]$mode # Data type to return, rspcode, datalen, rsptime, strtofind, sslexp
    )
    
    # Reset the $request var, as it lingers.
    $request = @{}
    
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
    $web = [System.Net.WebRequest]::Create($uri)
    $web.AllowAutoRedirect = $false;
    $web.Timeout = 5000;
    $web.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko";
    
    # Measure the response time of a successful/failed request.
    $responseTime = Measure-Command {
    
        # Catch a failed request
        try
        {
            # Override PowerShell's default of using TLS 1.0, so we are using TLS 1.2
            $request = $web.GetResponse()
            $request.Close()
            
        }
        catch [System.Net.WebException]
        {
            # Set the request status code to the true failure value
            $request | Add-Member StatusCode ([int]$_.Exception.Response.StatusCode)
        }
    }
    
    # [string]$mode # Data type to return, rspcode, datalen, rsptime, strtofind
    if ($mode -eq "rspcode")
    {
        [int]$request.StatusCode
    }
    ElseIf ($mode -eq "datalen")
    {
        $request.ContentLength
    }
    ElseIf ($mode -eq "rsptime")
    {
        [math]::Round($responseTime.TotalSeconds,3)
    }
    ElseIf ($mode -eq "sslexp")
    {
        [int](New-TimeSpan -Start (Get-Date -Date "01/01/1970") -End ([datetime]$web.ServicePoint.Certificate.GetExpirationDateString()).ToUniversalTime()).TotalSeconds
    }
    ElseIf ($mode -eq "strtofind")
    {
        if ($request.Content -Like "*$stringToFind*")
        {
            1
        }
        else
        {
            0
        }
    }​
    It is normally called by the Zabbix Server using the following Zabbix item keys:
    • custom.web.client["https://[fqdn]/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rsptime,Internal] (update interval: 45s)
    • custom.web.client["https://[fqdn]/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rspcode,Internal] (update interval: 1m)
    • custom.web.client["https://[fqdn]/",,rsptime,Public] (update interval: 45s)
    • custom.web.client["https://[fqdn]/",,rspcode,Public] (update interval: 1m)
    • custom.web.client["https://[fqdn]/webservice/",,rsptime,WebService] (update interval: 45s)
    • custom.web.client["https://[fqdn]/webservice/",,rspcode,WebService] (update interval: 1m)
    • custom.web.client["https://[fqdn]/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,sslexp,Internal] (update interval: 1d)
    • custom.web.client["https://[fqdn]/",,sslexp,Public] (update interval: 1d)
    • custom.web.client["http://127.0.0.1:7070/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rsptime,Internal] (update interval: 45s)
    • custom.web.client["http://127.0.0.1:7070/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rspcode,Internal] (update interval: 1m)
    • custom.web.client["http://127.0.0.1:6060/",,rsptime,Public] (update interval: 45s)
    • custom.web.client["http://127.0.0.1:6060/",,rspcode,Public] (update interval: 1m)
    • custom.web.client["http://127.0.0.1:7090/webservice/",,rsptime,WebService] (update interval: 45s)
    • custom.web.client["http://127.0.0.1:7090/webservice/",,rspcode,WebService] (update interval: 1m)
    • custom.web.client["https://[fqdn]/webservice/",,sslexp,WebService] (update interval: 1d)
    ---
    Zabbix Server OS Information (AWS EC2):
    Code:
    Distributor ID: Ubuntu
    Description:    Ubuntu 20.04.6 LTS
    Release:        20.04
    Codename:       focal​
    AWS Instance Type: c5.xlarge
    ---
    5.15.0-1038-aws
    ---
    Zabbix Agent (C:\Zabbix\conf\zabbix_agent2.conf):
    Code:
    LogType=file
    LogFile=C:\Zabbix\zabbix_agent2.log
    LogFileSize=100
    DebugLevel=3
    Server=[fqdn_to_zabbix_server]
    ListenPort=10160
    StatusPort=9999
    ServerActive=[fqdn_to_zabbix_server]
    Hostname=[hostname_of_the_instance]
    RefreshActiveChecks=120
    BufferSend=5
    BufferSize=100
    PersistentBufferPeriod=1h
    Timeout=30
    Include=C:\Zabbix\zabbix_agent.d\
    UnsafeUserParameters=1
    TLSConnect=psk
    TLSAccept=psk
    TLSPSKIdentity=[sensitive]
    TLSPSKFile=[path_to_psk_file]

    Zabbix Server (/etc/zabbix/zabbix_server.conf):
    Code:
    ListenPort=10051
    LogType=file
    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=10
    DebugLevel=3
    SocketDir=/var/run/zabbix
    PidFile=/var/run/zabbix/zabbix_server.pid
    DBHost=localhost
    DBName=zabbix
    DBUser=[sensitive]
    DBPassword=[sensitive]
    DBPort=3306
    AllowUnsupportedDBVersions=0
    HistoryStorageTypes=uint,dbl,str,log,text
    HistoryStorageDateIndex=0
    ExportFileSize=1G
    StartPollers=500
    StartIPMIPollers=5
    StartLLDProcessors=2
    StartPreprocessors=50
    StartPollersUnreachable=250
    StartHistoryPollers=50
    StartTrappers=50
    StartPingers=1
    StartDiscoverers=1
    StartHTTPPollers=50
    StartTimers=1
    StartEscalators=1
    JavaGateway=127.0.0.1
    JavaGatewayPort=10052
    StartJavaPollers=5
    StartVMwareCollectors=5
    VMwareFrequency=60
    VMwarePerfFrequency=60
    VMwareCacheSize=8M
    VMwareTimeout=10
    SNMPTrapperFile=/tmp/zabbix_traps.tmp
    StartSNMPTrapper=1
    HousekeepingFrequency=1
    MaxHousekeeperDelete=250000
    CacheSize=512M
    CacheUpdateFrequency=60
    StartDBSyncers=4
    HistoryCacheSize=64M
    HistoryIndexCacheSize=64M
    TrendCacheSize=256M
    TrendFunctionCacheSize=64M
    ValueCacheSize=64M
    Timeout=30
    TrapperTimeout=300
    UnreachablePeriod=45
    UnavailableDelay=60
    UnreachableDelay=15
    AlertScriptsPath=/usr/lib/zabbix/alertscripts
    ExternalScripts=/usr/lib/zabbix/externalscripts
    FpingLocation=/usr/sbin/fping
    Fping6Location=/usr/sbin/fping6
    LogSlowQueries=0
    TmpDir=/tmp
    StartProxyPollers=1
    ProxyConfigFrequency=3600
    ProxyDataFrequency=1
    AllowRoot=0
    User=zabbix
    Include=/etc/zabbix/zabbix_server.conf.d
    SSLCertLocation=${datadir}/zabbix/ssl/certs
    SSLKeyLocation=${datadir}/zabbix/ssl/keys
    LoadModulePath=${libdir}/modules
    VaultURL=https://127.0.0.1:8200
    StartReportWriters=0
    ServiceManagerSyncFrequency=60
    ProblemHousekeepingFrequency=60
    StartODBCPollers=1​
Working...