I'm seeking guidance on troubleshooting and optimizing a custom PowerShell script running on our Windows AWS instances (Zabbix agents installed here) that causes issues during simultaneous server restarts, leading to missing data in Zabbix items.
After the instances come back online post Windows reboot, the Zabbix custom monitoring PowerShell script runs for prolonged periods and consumes excessive CPU resources.
It appears that ALL items (see below re: customer.web.client) are spawned at once leading to multiple PowerShell scripts running at the same time. This behavior is only observed after we reboot the monitored server.
Under normal circumstances, we only see the custom.web.client call come in one at a time.
I'm curious if there are any tuning options available for the Zabbix Agent, such as forking or running on multiple threads, or any settings in the Zabbix Server that can aid in diagnosing and/or resolving this issue.
The following is the custom script causing the issue:
It is normally called by the Zabbix Server using the following Zabbix item keys:
Zabbix Server OS Information (AWS EC2):
---
Zabbix Agent (C:\Zabbix\conf\zabbix_agent2.conf):
Zabbix Server (/etc/zabbix/zabbix_server.conf):
After the instances come back online post Windows reboot, the Zabbix custom monitoring PowerShell script runs for prolonged periods and consumes excessive CPU resources.
It appears that ALL items (see below re: customer.web.client) are spawned at once leading to multiple PowerShell scripts running at the same time. This behavior is only observed after we reboot the monitored server.
Under normal circumstances, we only see the custom.web.client call come in one at a time.
I'm curious if there are any tuning options available for the Zabbix Agent, such as forking or running on multiple threads, or any settings in the Zabbix Server that can aid in diagnosing and/or resolving this issue.
The following is the custom script causing the issue:
Code:
Param(
[string]$uri, # Full URI to be requested
[string]$stringToFind, # String to look for in the response
[string]$mode # Data type to return, rspcode, datalen, rsptime, strtofind, sslexp
)
# Reset the $request var, as it lingers.
$request = @{}
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
$web = [System.Net.WebRequest]::Create($uri)
$web.AllowAutoRedirect = $false;
$web.Timeout = 5000;
$web.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko";
# Measure the response time of a successful/failed request.
$responseTime = Measure-Command {
# Catch a failed request
try
{
# Override PowerShell's default of using TLS 1.0, so we are using TLS 1.2
$request = $web.GetResponse()
$request.Close()
}
catch [System.Net.WebException]
{
# Set the request status code to the true failure value
$request | Add-Member StatusCode ([int]$_.Exception.Response.StatusCode)
}
}
# [string]$mode # Data type to return, rspcode, datalen, rsptime, strtofind
if ($mode -eq "rspcode")
{
[int]$request.StatusCode
}
ElseIf ($mode -eq "datalen")
{
$request.ContentLength
}
ElseIf ($mode -eq "rsptime")
{
[math]::Round($responseTime.TotalSeconds,3)
}
ElseIf ($mode -eq "sslexp")
{
[int](New-TimeSpan -Start (Get-Date -Date "01/01/1970") -End ([datetime]$web.ServicePoint.Certificate.GetExpirationDateString()).ToUniversalTime()).TotalSeconds
}
ElseIf ($mode -eq "strtofind")
{
if ($request.Content -Like "*$stringToFind*")
{
1
}
else
{
0
}
}
- custom.web.client["https://[fqdn]/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rsptime,Internal] (update interval: 45s)
- custom.web.client["https://[fqdn]/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rspcode,Internal] (update interval: 1m)
- custom.web.client["https://[fqdn]/",,rsptime,Public] (update interval: 45s)
- custom.web.client["https://[fqdn]/",,rspcode,Public] (update interval: 1m)
- custom.web.client["https://[fqdn]/webservice/",,rsptime,WebService] (update interval: 45s)
- custom.web.client["https://[fqdn]/webservice/",,rspcode,WebService] (update interval: 1m)
- custom.web.client["https://[fqdn]/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,sslexp,Internal] (update interval: 1d)
- custom.web.client["https://[fqdn]/",,sslexp,Public] (update interval: 1d)
- custom.web.client["http://127.0.0.1:7070/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rsptime,Internal] (update interval: 45s)
- custom.web.client["http://127.0.0.1:7070/srs/logon.do?method=load&firstTime=yes&inactive=false" ,,rspcode,Internal] (update interval: 1m)
- custom.web.client["http://127.0.0.1:6060/",,rsptime,Public] (update interval: 45s)
- custom.web.client["http://127.0.0.1:6060/",,rspcode,Public] (update interval: 1m)
- custom.web.client["http://127.0.0.1:7090/webservice/",,rsptime,WebService] (update interval: 45s)
- custom.web.client["http://127.0.0.1:7090/webservice/",,rspcode,WebService] (update interval: 1m)
- custom.web.client["https://[fqdn]/webservice/",,sslexp,WebService] (update interval: 1d)
Zabbix Server OS Information (AWS EC2):
Code:
Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal AWS Instance Type: c5.xlarge --- 5.15.0-1038-aws
Zabbix Agent (C:\Zabbix\conf\zabbix_agent2.conf):
Code:
LogType=file LogFile=C:\Zabbix\zabbix_agent2.log LogFileSize=100 DebugLevel=3 Server=[fqdn_to_zabbix_server] ListenPort=10160 StatusPort=9999 ServerActive=[fqdn_to_zabbix_server] Hostname=[hostname_of_the_instance] RefreshActiveChecks=120 BufferSend=5 BufferSize=100 PersistentBufferPeriod=1h Timeout=30 Include=C:\Zabbix\zabbix_agent.d\ UnsafeUserParameters=1 TLSConnect=psk TLSAccept=psk TLSPSKIdentity=[sensitive] TLSPSKFile=[path_to_psk_file]
Zabbix Server (/etc/zabbix/zabbix_server.conf):
Code:
ListenPort=10051
LogType=file
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=10
DebugLevel=3
SocketDir=/var/run/zabbix
PidFile=/var/run/zabbix/zabbix_server.pid
DBHost=localhost
DBName=zabbix
DBUser=[sensitive]
DBPassword=[sensitive]
DBPort=3306
AllowUnsupportedDBVersions=0
HistoryStorageTypes=uint,dbl,str,log,text
HistoryStorageDateIndex=0
ExportFileSize=1G
StartPollers=500
StartIPMIPollers=5
StartLLDProcessors=2
StartPreprocessors=50
StartPollersUnreachable=250
StartHistoryPollers=50
StartTrappers=50
StartPingers=1
StartDiscoverers=1
StartHTTPPollers=50
StartTimers=1
StartEscalators=1
JavaGateway=127.0.0.1
JavaGatewayPort=10052
StartJavaPollers=5
StartVMwareCollectors=5
VMwareFrequency=60
VMwarePerfFrequency=60
VMwareCacheSize=8M
VMwareTimeout=10
SNMPTrapperFile=/tmp/zabbix_traps.tmp
StartSNMPTrapper=1
HousekeepingFrequency=1
MaxHousekeeperDelete=250000
CacheSize=512M
CacheUpdateFrequency=60
StartDBSyncers=4
HistoryCacheSize=64M
HistoryIndexCacheSize=64M
TrendCacheSize=256M
TrendFunctionCacheSize=64M
ValueCacheSize=64M
Timeout=30
TrapperTimeout=300
UnreachablePeriod=45
UnavailableDelay=60
UnreachableDelay=15
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
Fping6Location=/usr/sbin/fping6
LogSlowQueries=0
TmpDir=/tmp
StartProxyPollers=1
ProxyConfigFrequency=3600
ProxyDataFrequency=1
AllowRoot=0
User=zabbix
Include=/etc/zabbix/zabbix_server.conf.d
SSLCertLocation=${datadir}/zabbix/ssl/certs
SSLKeyLocation=${datadir}/zabbix/ssl/keys
LoadModulePath=${libdir}/modules
VaultURL=https://127.0.0.1:8200
StartReportWriters=0
ServiceManagerSyncFrequency=60
ProblemHousekeepingFrequency=60
StartODBCPollers=1