I will admit I'm somewhat new to Zabbix, and I inherited this setup but, SOMETHING weird is going on that I hope someone can spot.
Zabbix server 5.4.12, proxy also 5.4.12, agents also 5.4.12 ... No changes to hosts monitored (these hosts), nor zabbix server, nor proxy, but all of a sudden 20-30 hosts are unavailable. Nothing changed in AWS SG, and port 10050 is open to the monitored hosts, and port 10051 is open back to proxy. This is verifiable with netcat, and zabbix_get works just fine from proxy to all of these hosts. agent conf files are very straightforward ... Hostname=aws_instance_ID, Server/ServerActive=yyy-ProductionUSA-zabbix-proxy.xxxxxx.com, port defaults to 10051. AWS SG rules allow all of these hosts/ports
set agent to DebugLevel=5 and here are some significant differences. I also set the proxy DebugLevel=5 and see similar differences.
I'm completely stumped here ... help?
PROBLEM HOST (one of several)
-- and this example of one interaction
WORKING HOST (lots and lots - many hundreds)
-- and this example of one interaction
Zabbix server 5.4.12, proxy also 5.4.12, agents also 5.4.12 ... No changes to hosts monitored (these hosts), nor zabbix server, nor proxy, but all of a sudden 20-30 hosts are unavailable. Nothing changed in AWS SG, and port 10050 is open to the monitored hosts, and port 10051 is open back to proxy. This is verifiable with netcat, and zabbix_get works just fine from proxy to all of these hosts. agent conf files are very straightforward ... Hostname=aws_instance_ID, Server/ServerActive=yyy-ProductionUSA-zabbix-proxy.xxxxxx.com, port defaults to 10051. AWS SG rules allow all of these hosts/ports
set agent to DebugLevel=5 and here are some significant differences. I also set the proxy DebugLevel=5 and see similar differences.
I'm completely stumped here ... help?
PROBLEM HOST (one of several)
Code:
# grep -i upload zabbix_agent2.log | tail -5
#
# grep -i success zabbix_agent2.log | tail -5
2022/12/14 23:32:20.534996 received [{"response":"success","data":[]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:34:21.536900 received [{"response":"success","data":[]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:36:22.536643 received [{"response":"success","data":[]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:38:23.534645 received [{"response":"success","data":[]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:40:24.535956 received [{"response":"success","data":[]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
#
-- and this example of one interaction
Code:
2022/12/14 23:42:25.530024 connecting to [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:42:25.533346 sending [{"request":"active checks","host":"i-f35b8f20","version":"5.4"}] to [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:42:25.533870 receiving data from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:42:25.535830 received [{"response":"success","data":[]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:42:25.535904 [101] End of refreshActiveChecks() from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
WORKING HOST (lots and lots - many hundreds)
Code:
# grep -i upload zabbix_agent2.log | tail -5
2022/12/14 23:41:04.039992 [101] upload history data, 3/100 value(s)
2022/12/14 23:41:10.033989 [101] upload history data, 2/100 value(s)
2022/12/14 23:41:32.034316 [101] upload history data, 5/100 value(s)
2022/12/14 23:41:38.033503 [101] upload history data, 6/100 value(s)
2022/12/14 23:41:43.033983 [101] upload history data, 6/100 value(s)
# grep -i success zabbix_agent2.log | tail -5
2022/12/14 23:41:10.038187 received [{"response":"success","info":"processed: 2; failed: 0; total: 2; seconds spent: 0.000085"}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:41:32.037079 received [{"response":"success","info":"processed: 5; failed: 0; total: 5; seconds spent: 0.000104"}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:41:38.043196 received [{"response":"success","info":"processed: 6; failed: 0; total: 6; seconds spent: 0.000118"}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:41:43.038436 received [{"response":"success","info":"processed: 6; failed: 0; total: 6; seconds spent: 0.000109"}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:41:49.039136 received [{"response":"success","info":"processed: 5; failed: 0; total: 5; seconds spent: 0.000086"}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
#
Code:
2022/12/14 23:47:01.036430 sending [{"request":"active checks","host":"i-078a76b4a75a9ba2c","version":"5.4"}] to [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:47:01.036977 receiving data from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:47:01.040822 received [{"response":"success","data":[{"key":"kernel.maxfiles","itemid":350966,"delay ":" 1h","lastlogsize":0,"mtime":0},{"key":"kernel.ma xp roc","itemid":350967,"delay":"1h","lastlogsize":0 , "mtime":0},{"key":"net.if.discovery","itemid": 3509 68,"delay":"1h","lastlogsize":0,"mtime":0},{"key" : "net.if.in[\"ens5\",dropped]","itemid":351985,"delay":"3m","lastlogsize":0 ,"mt ime":0},{"key":"net.if.in[\"ens5\",errors]","itemid":351986,"delay":"3m","lastlogsize":0 ,"mt ime":0},{"key":"net.if.in[\"ens5\"]","itemid":351984,"delay":"3m","lastlogsize":0 ,"mt ime":0},{"key":"net.if.out[\"ens5\",dropped]","itemid":351988,"delay":"3m","lastlogsize":0 ,"mt ime":0},{"key":"net.if.out[\"ens5\",errors]","itemid":351989,"delay":"3m","lastlogsize":0 ,"mt ime":0},{"key":"net.if.out[\"ens5\"]","itemid":351987,"delay":"3m","lastlogsize":0 ,"mt ime":0},{"key":"proc.num","itemid":350975,"delay" : "1m","lastlogsize":0,"mtime":0},{"key":"proc.n um[,,run]","itemid":350976,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.boottime","itemid":350977, " delay":"15m","lastlogsize":0,"mtime":0},{"key":"sy stem.cpu.intr","itemid":350978,"delay":"1m","lastl ogsize":0,"mtime":0},{"key":"system.cpu.load[all,avg15]","itemid":350980,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.load[all,avg1]","itemid":350979,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.load[all,avg5]","itemid":350981,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.num","itemid":350982," d elay":"1m","lastlogsize":0,"mtime":0},{"key":"syst em.cpu.switches","itemid":350983,"delay":"1m","las tlogsize":0,"mtime":0},{"key":"system.cpu.util[,guest]","itemid":350984,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,guest_nice]","itemid":350985,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,idle]","itemid":350986,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,interrupt]","itemid":350988,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,iowait]","itemid":350989,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,nice]","itemid":350990,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,softirq]","itemid":350991,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,steal]","itemid":350992,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,system]","itemid":350993,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.cpu.util[,user]","itemid":350994,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.hostname","itemid":350995, " delay":"1h","lastlogsize":0,"mtime":0},{"key":"sys tem.run[\"awk '/^ status/ {print $2}' /var/lib/puppet/state/last_run_report.yaml\"]","itemid":670023,"delay":"5m","lastlogsize":0 ,"mt ime":0},{"key":"system.run[\"awk '/last_run/ {print $2}' /var/lib/puppet/state/last_run_summary.yaml\"]","itemid":350999,"delay":"5m","lastlogsize":0 ,"mt ime":0},{"key":"system.sw.arch","itemid":351003," d elay":"1h","lastlogsize":0,"mtime":0},{"key":"syst em.sw.os","itemid":351004,"delay":"1h","lastlogsiz e":0,"mtime":0},{"key":"system.sw.packages","it emi d":351005,"delay":"1h","lastlogsize":0,"mtime": 0}, {"key":"system.swap.size[,free]","itemid":351000,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.swap.size[,pfree]","itemid":351001,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.swap.size[,total]","itemid":351002,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"system.uname","itemid":351006,"de l ay":"1h","lastlogsize":0,"mtime":0},{"key":"syst em .uptime","itemid":351007,"delay":"30s","lastlogsiz e":0,"mtime":0},{"key":"system.users.num","item id" :351008,"delay":"1m","lastlogsize":0,"mtime":0},{" key":"vfs.dev.discovery","itemid":351009,"delay": " 1h","lastlogsize":0,"mtime":0},{"key":"vfs.file. ck sum[/etc/passwd]","itemid":351012,"delay":"15m","lastlogsize": 0,"m time":0},{"key":"vfs.file.contents[\"/sys/class/net/ens5/operstate\"]","itemid":351990,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.file.contents[\"/sys/class/net/ens5/type\"]","itemid":351991,"delay":"1h","lastlogsize":0 ,"mt ime":0},{"key":"vfs.file.contents[/sys/block/nvme0n1/stat]","itemid":351996,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.file.contents[/sys/block/nvme1n1/stat]","itemid":352003,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.discovery","itemid":351022 , "delay":"1h","lastlogsize":0,"mtime":0},{"key" :"vf s.fs.inode[/,pfree]","itemid":352010,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.inode[/opt,pfree]","itemid":352011,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.size[/,pused]","itemid":352012,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.size[/,total]","itemid":352014,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.size[/,used]","itemid":352016,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.size[/opt,pused]","itemid":352013,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.size[/opt,total]","itemid":352015,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vfs.fs.size[/opt,used]","itemid":352017,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vm.memory.size[available]","itemid":351027,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vm.memory.size[pavailable]","itemid":351028,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"vm.memory.size[total]","itemid":351030,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"web.page.get[http://localhost,_cluster/health?timeout=5s,9200]","itemid":351031,"delay":"1m","lastlogsize":0 ,"mt ime":0},{"key":"web.page.get[http://localhost,_cluster/stats,9200]","itemid":351034,"delay":"1m","lastlogsize":0 ,"mt ime":0}]}] from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
2022/12/14 23:47:01.041099 [101] End of refreshActiveChecks() from [yyy-ProductionUSA-zabbix-proxy.xxxxxx.com:10051]
Comment