Folks, I have a weird problem I don't know how to resolve. This happened twice to me so far. The first was on 4/30 and the second time just now. The only action I've taken was to enable debug logging level 4 on the server which I just did.
I have my actions configured to send me emails when things fail. All of a sudden I get swamped with emails from every one of my agents at the same time. It looks like all my agent.ping items and SSH items fail all at once and then go back to normal a few minutes later.
Couple of other symptoms...
* 30+ hosts are reported as having issues, but I only see a few errors in the zabbix_server.log file.
* I see a bunch of these:
1534:20150507:132208.184 item "rdmdxinfra03.mdx.med:ssh.run[uptime]" became not supported: Cannot request a shell
Maybe its a limits problem? Are there a ulimit recommendation for Zabbix?
I'm ruling out the hypervisor that this is running on and the network it is connecting to. If we were having issues there all of our applications would be complaining as well.
Any other thoughts on what it could be?
I have my actions configured to send me emails when things fail. All of a sudden I get swamped with emails from every one of my agents at the same time. It looks like all my agent.ping items and SSH items fail all at once and then go back to normal a few minutes later.
Couple of other symptoms...
* 30+ hosts are reported as having issues, but I only see a few errors in the zabbix_server.log file.
* I see a bunch of these:
1534:20150507:132208.184 item "rdmdxinfra03.mdx.med:ssh.run[uptime]" became not supported: Cannot request a shell
Maybe its a limits problem? Are there a ulimit recommendation for Zabbix?
I'm ruling out the hypervisor that this is running on and the network it is connecting to. If we were having issues there all of our applications would be complaining as well.
Any other thoughts on what it could be?
Comment