Hello guys i am having a very weird issue. I am using zabbix to monitor our cisco routers and switches over SNMP. I have pings setup from zabbix and they work fine, but sometimes, when the wan frame relay links go down and come back up, the router goes up and pings are fine, but the switch "stays down", and in reality i can't even ping the switch from the OS (debian 7), until i reboot the switch. The weird thing is that i can ping the switch from my windows machine just fine, and my machine is on the same subnet as the zabbix linux server.
These are cisco 2950 switches and my understanding is that they only do layer 2 switching and in fact i can see that there are no ACLs interfering with the ping or anything.
So how come the switch is selectively blocking ping (and snmp and everything) to just the zabbix server? Is it somehow detecting too much snmp traffic requests coming from zabbix and blocking all requests? How come a reboot fixes it? This behaviour is not only on one switch, but practically everyone we have, but happens randomly.
Does anyone have an idea? I cannot keep rebooting the switch, especially since it is difficult to do it in office hours.
Traceroute to the router:
Windows machine:
1 1 ms <1 ms <1 ms 192.168.3.1
2 <1 ms <1 ms <1 ms 192.168.10.3
3 3 ms <1 ms <1 ms 192.168.10.5
4 12 ms 11 ms 12 ms 192.168.121.10
Linux server:
1 192.168.3.1 (192.168.3.1) 0.426 ms 0.886 ms 1.103 ms
2 192.168.10.3 (192.168.10.3) 1.543 ms 1.876 ms 2.190 ms
3 192.168.10.5 (192.168.10.5) 2.479 ms 1.159 ms 1.150 ms
4 10.1.1.30 (10.1.1.30) 12.314 ms * *
10.1.1.30 is the router's WAN ip.
Traceroute to the switch:
Windows machine:
1 <1 ms <1 ms <1 ms 192.168.3.1
2 1 ms <1 ms 1 ms 192.168.10.3
3 2 ms <1 ms <1 ms 192.168.10.5
4 12 ms 11 ms 11 ms 10.1.1.30
5 20 ms 12 ms 13 ms 192.168.121.3
Linux server:
1 192.168.3.1 (192.168.3.1) 0.404 ms 0.858 ms 1.075 ms
2 192.168.10.3 (192.168.10.3) 1.533 ms 2.046 ms 2.784 ms
3 192.168.10.5 (192.168.10.5) 3.236 ms 1.113 ms 1.106 ms
4 10.1.1.30 (10.1.1.30) 11.950 ms 12.354 ms 12.698 ms
5 * * *
[...]
30 * * *
These are cisco 2950 switches and my understanding is that they only do layer 2 switching and in fact i can see that there are no ACLs interfering with the ping or anything.
So how come the switch is selectively blocking ping (and snmp and everything) to just the zabbix server? Is it somehow detecting too much snmp traffic requests coming from zabbix and blocking all requests? How come a reboot fixes it? This behaviour is not only on one switch, but practically everyone we have, but happens randomly.
Does anyone have an idea? I cannot keep rebooting the switch, especially since it is difficult to do it in office hours.
Traceroute to the router:
Windows machine:
1 1 ms <1 ms <1 ms 192.168.3.1
2 <1 ms <1 ms <1 ms 192.168.10.3
3 3 ms <1 ms <1 ms 192.168.10.5
4 12 ms 11 ms 12 ms 192.168.121.10
Linux server:
1 192.168.3.1 (192.168.3.1) 0.426 ms 0.886 ms 1.103 ms
2 192.168.10.3 (192.168.10.3) 1.543 ms 1.876 ms 2.190 ms
3 192.168.10.5 (192.168.10.5) 2.479 ms 1.159 ms 1.150 ms
4 10.1.1.30 (10.1.1.30) 12.314 ms * *
10.1.1.30 is the router's WAN ip.
Traceroute to the switch:
Windows machine:
1 <1 ms <1 ms <1 ms 192.168.3.1
2 1 ms <1 ms 1 ms 192.168.10.3
3 2 ms <1 ms <1 ms 192.168.10.5
4 12 ms 11 ms 11 ms 10.1.1.30
5 20 ms 12 ms 13 ms 192.168.121.3
Linux server:
1 192.168.3.1 (192.168.3.1) 0.404 ms 0.858 ms 1.075 ms
2 192.168.10.3 (192.168.10.3) 1.533 ms 2.046 ms 2.784 ms
3 192.168.10.5 (192.168.10.5) 3.236 ms 1.113 ms 1.106 ms
4 10.1.1.30 (10.1.1.30) 11.950 ms 12.354 ms 12.698 ms
5 * * *
[...]
30 * * *
Comment