Here is the scenario, I have a Zabbix Server in the cloud as well as many application servers which I plan to monitor with my Zabbix server.
My hosts can always reach the Zabbix server but they can't always be reached by the server, thus making active health checks a perfect use case for my environment.
Environment Details:
AWS (multiple locations)
Zabbix server: 50.50.50.50:10051 (public IP and Port)
www1 server 110.10.10.10 (hostname = www1)
www1 can telnet to 50.50.50.50:10051 without a problem
Zabbix server can not reach 110.10.10.10
zabbix config on www1:
ServerActive=50.50.50.50:10051
Hostname=www1 (have it hard set now, was prior using null with HostnameItem=system.hostname option since box is configured as www1).
RefreshActiveChecks=60
I have debug set to highest level as well:
Logs on the agent host simply say:
This goes on for pages once starting the service.
So now my host is ready to send to Zabbix Server, so I go onto the Zabbix server and goto configuration and add a new host.
I use the same string for host name 'www1', as per some documents I put the IP of the agent as 0.0.0.0 and set monitored to on, this ends up having the zabbix server duplicating it's local monitoring agent as that server.
If I set the IP to that of the server, or where the server will be appearing to come form, nothing happens. The logs on the zabbix server say this:
1366:20140118:223849.470 Zabbix agent item "net.if.in[eth0]" on host "www1" failed: first network error, wait for 15 seconds
1368:20140118:223904.252 Zabbix agent item "net.if.in[eth0]" on host "www1" failed: another network error, wait for 15 seconds
1368:20140118:223919.259 Zabbix agent item "vfs.fs.size[/,pfree]" on host "www1" failed: another network error, wait for 15 seconds
1368:20140118:223934.267 temporarily disabling Zabbix agent checks on host "www1": host unavailable
My assumption of behavior is:
You tell zabbix server some information about hosts expected:
I have hosts named www1,www2,www3 listen for them and match exact on name. When you find them apply the template for Linux OS to them, monitor the ram, swap, cpu, IO wait etc... (which all works fine on local zabbix agent on the zabbix server). You tell a Zabbix Agent on one of the WWW# boxes to start talking active updates to the server, it talks to server says "yo I am here, gimme the payload of what you want me to monitor", next time it phones home to the Zabbix server (based on my config, 60 seconds) it phones home with the values the server is looking to monitor.
I would like to get this working manually so I can use the zabbix API to do this automatically when I provision or destroy a node. I have a script which interacts with EC2 and my deployment automation system to make a VM and provision the box to working order, so this if it works should be an easy addition.
My hosts can always reach the Zabbix server but they can't always be reached by the server, thus making active health checks a perfect use case for my environment.
Environment Details:
AWS (multiple locations)
Zabbix server: 50.50.50.50:10051 (public IP and Port)
www1 server 110.10.10.10 (hostname = www1)
www1 can telnet to 50.50.50.50:10051 without a problem
Zabbix server can not reach 110.10.10.10
zabbix config on www1:
ServerActive=50.50.50.50:10051
Hostname=www1 (have it hard set now, was prior using null with HostnameItem=system.hostname option since box is configured as www1).
RefreshActiveChecks=60
I have debug set to highest level as well:
Logs on the agent host simply say:
8121:20140118:223810.416 active checks #1 [getting list of active checks]
8121:20140118:223810.416 In refresh_active_checks() host:'50.50.50.50' port:10051
8121:20140118:223810.418 sending [{
"request":"active checks",
"host":"www1"}]
8121:20140118:223810.418 before read
8121:20140118:223810.421 got [{
"response":"success",
"data":[]}]
8121:20140118:223810.421 In parse_list_of_checks()
8121:20140118:223810.421 In disable_all_metrics()
8121:20140118:223810.421 End of refresh_active_checks():SUCCEED
8121:20140118:223810.421 active checks #1 [processing active checks]
8121:20140118:223810.421 In process_active_checks('54.193.85.139',10051)
8121:20140118:223810.421 End of process_active_checks()
8121:20140118:223810.421 In get_min_nextcheck()
8121:20140118:223810.422 active checks #1 [idle 1 sec]
8120:20140118:223810.721 collector [processing data]
8120:20140118:223810.725 In update_cpustats()
8120:20140118:223810.725 End of update_cpustats()
8120:20140118:223810.725 collector [idle 1 sec]
8121:20140118:223811.422 In send_buffer() host:'50.50.50.50' port:10051 values:0/100
8121:20140118:223810.416 In refresh_active_checks() host:'50.50.50.50' port:10051
8121:20140118:223810.418 sending [{
"request":"active checks",
"host":"www1"}]
8121:20140118:223810.418 before read
8121:20140118:223810.421 got [{
"response":"success",
"data":[]}]
8121:20140118:223810.421 In parse_list_of_checks()
8121:20140118:223810.421 In disable_all_metrics()
8121:20140118:223810.421 End of refresh_active_checks():SUCCEED
8121:20140118:223810.421 active checks #1 [processing active checks]
8121:20140118:223810.421 In process_active_checks('54.193.85.139',10051)
8121:20140118:223810.421 End of process_active_checks()
8121:20140118:223810.421 In get_min_nextcheck()
8121:20140118:223810.422 active checks #1 [idle 1 sec]
8120:20140118:223810.721 collector [processing data]
8120:20140118:223810.725 In update_cpustats()
8120:20140118:223810.725 End of update_cpustats()
8120:20140118:223810.725 collector [idle 1 sec]
8121:20140118:223811.422 In send_buffer() host:'50.50.50.50' port:10051 values:0/100
So now my host is ready to send to Zabbix Server, so I go onto the Zabbix server and goto configuration and add a new host.
I use the same string for host name 'www1', as per some documents I put the IP of the agent as 0.0.0.0 and set monitored to on, this ends up having the zabbix server duplicating it's local monitoring agent as that server.
If I set the IP to that of the server, or where the server will be appearing to come form, nothing happens. The logs on the zabbix server say this:
1366:20140118:223849.470 Zabbix agent item "net.if.in[eth0]" on host "www1" failed: first network error, wait for 15 seconds
1368:20140118:223904.252 Zabbix agent item "net.if.in[eth0]" on host "www1" failed: another network error, wait for 15 seconds
1368:20140118:223919.259 Zabbix agent item "vfs.fs.size[/,pfree]" on host "www1" failed: another network error, wait for 15 seconds
1368:20140118:223934.267 temporarily disabling Zabbix agent checks on host "www1": host unavailable
My assumption of behavior is:
You tell zabbix server some information about hosts expected:
I have hosts named www1,www2,www3 listen for them and match exact on name. When you find them apply the template for Linux OS to them, monitor the ram, swap, cpu, IO wait etc... (which all works fine on local zabbix agent on the zabbix server). You tell a Zabbix Agent on one of the WWW# boxes to start talking active updates to the server, it talks to server says "yo I am here, gimme the payload of what you want me to monitor", next time it phones home to the Zabbix server (based on my config, 60 seconds) it phones home with the values the server is looking to monitor.
I would like to get this working manually so I can use the zabbix API to do this automatically when I provision or destroy a node. I have a script which interacts with EC2 and my deployment automation system to make a VM and provision the box to working order, so this if it works should be an easy addition.
Comment