At the end of discuss about this problem is:
"I got past this by deleting "Zabbix server" and adding it again. My monitoring started to behave just like described after I changed IP address. So I guess for some reason even though IP address is changed for server agent.ping tries to ping old address :/"
Looks like it is true.
I made a little deeper investigation about this problem because I see the same issue on our systems.
After enable Debug=4 in agent cfg and restart agent after about 3-4 min I able to find logs line like:
461:20131005:113959.233 Listener error: Connection from [172.16.18.254] rejected. Allowed server is [zabbix-proxy.<domain>]
Of course 172.16.18.254 is not listed in zabbix_agent.conf::Server.
After receiving above by agent in proxy database I see almost instantly change of the agent status (monitored over proxy on zabbix-proxy.<domain>) to:
error: Got empty string from [<agent.address]. Assuming that agent dropped connection because of access permissions
And here is surprise. I in my case 172.16.18.254 it is default gateway adress !?!
I've started looking what is going on with this traffic between zabbix agent and 172.16.18.254 GW. Using "tcpdump host 172.16.18.254" I've been able to catch sequence:
12:05:58.418814 arp who-has 172.16.18.254 tell 172.16.18.254
12:05:58.619562 arp who-has 172.16.18.254 tell 172.16.18.254
12:05:58.978877 IP 172.16.18.254.54944 > <agent.address>.10050: S 3120464593:3120464593(0) win 14600 <mss 1460,sackOK,timestamp 1048637435 0,nop,wscale 7>
12:05:58.978900 IP <agent.address>.10050 > 172.16.18.254.54944: S 3589998947:3589998947(0) ack 3120464594 win 5792 <mss 1460,sackOK,timestamp 1396699441 1048637435,nop,wscale 7>
12:05:58.979097 IP 172.16.18.254.54944 > <agent.address>.10050: . ack 1 win 115 <nop,nop,timestamp 1048637435 1396699441>
12:05:58.979195 IP 172.16.18.254.54944 > <agent.address>.10050: P 1:22(21) ack 1 win 115 <nop,nop,timestamp 1048637435 1396699441>
12:05:58.979208 IP <agent.address>.10050 > 172.16.18.254.54944: . ack 22 win 46 <nop,nop,timestamp 1396699441 1048637435>
12:05:58.979261 IP <agent.address>.10050 > 172.16.18.254.54944: F 1:1(0) ack 22 win 46 <nop,nop,timestamp 1396699441 1048637435>
12:05:58.979283 IP <agent.address>.10050 > 172.16.18.254.54944: R 2:2(0) ack 22 win 46 <nop,nop,timestamp 1396699441 1048637435>
12:05:59.634247 arp who-has 172.16.18.254 tell 172.16.18.254
12:06:00.619605 arp who-has 172.16.18.254 tell 172.16.18.254
12:06:00.819529 arp who-has 172.16.18.254 tell 172.16.18.254
I have no idea why this conversation between <agent.address> with zabbix agent and def GW happens. Proxy is outside of the subnet with GW and agent and all communication goes over this GW. I'm not able at the moment diagnose this case and check what is going on from network device point of view.
Comments?
"I got past this by deleting "Zabbix server" and adding it again. My monitoring started to behave just like described after I changed IP address. So I guess for some reason even though IP address is changed for server agent.ping tries to ping old address :/"
Looks like it is true.
I made a little deeper investigation about this problem because I see the same issue on our systems.
After enable Debug=4 in agent cfg and restart agent after about 3-4 min I able to find logs line like:
461:20131005:113959.233 Listener error: Connection from [172.16.18.254] rejected. Allowed server is [zabbix-proxy.<domain>]
Of course 172.16.18.254 is not listed in zabbix_agent.conf::Server.
After receiving above by agent in proxy database I see almost instantly change of the agent status (monitored over proxy on zabbix-proxy.<domain>) to:
error: Got empty string from [<agent.address]. Assuming that agent dropped connection because of access permissions
And here is surprise. I in my case 172.16.18.254 it is default gateway adress !?!
I've started looking what is going on with this traffic between zabbix agent and 172.16.18.254 GW. Using "tcpdump host 172.16.18.254" I've been able to catch sequence:
12:05:58.418814 arp who-has 172.16.18.254 tell 172.16.18.254
12:05:58.619562 arp who-has 172.16.18.254 tell 172.16.18.254
12:05:58.978877 IP 172.16.18.254.54944 > <agent.address>.10050: S 3120464593:3120464593(0) win 14600 <mss 1460,sackOK,timestamp 1048637435 0,nop,wscale 7>
12:05:58.978900 IP <agent.address>.10050 > 172.16.18.254.54944: S 3589998947:3589998947(0) ack 3120464594 win 5792 <mss 1460,sackOK,timestamp 1396699441 1048637435,nop,wscale 7>
12:05:58.979097 IP 172.16.18.254.54944 > <agent.address>.10050: . ack 1 win 115 <nop,nop,timestamp 1048637435 1396699441>
12:05:58.979195 IP 172.16.18.254.54944 > <agent.address>.10050: P 1:22(21) ack 1 win 115 <nop,nop,timestamp 1048637435 1396699441>
12:05:58.979208 IP <agent.address>.10050 > 172.16.18.254.54944: . ack 22 win 46 <nop,nop,timestamp 1396699441 1048637435>
12:05:58.979261 IP <agent.address>.10050 > 172.16.18.254.54944: F 1:1(0) ack 22 win 46 <nop,nop,timestamp 1396699441 1048637435>
12:05:58.979283 IP <agent.address>.10050 > 172.16.18.254.54944: R 2:2(0) ack 22 win 46 <nop,nop,timestamp 1396699441 1048637435>
12:05:59.634247 arp who-has 172.16.18.254 tell 172.16.18.254
12:06:00.619605 arp who-has 172.16.18.254 tell 172.16.18.254
12:06:00.819529 arp who-has 172.16.18.254 tell 172.16.18.254
I have no idea why this conversation between <agent.address> with zabbix agent and def GW happens. Proxy is outside of the subnet with GW and agent and all communication goes over this GW. I'm not able at the moment diagnose this case and check what is going on from network device point of view.
Comments?