We have configured a Zabbix 4.0.3 server to monitor about 200 JBoss EAP 7.0.4 server JVMs. Problems is that not only are the JMX connections are working intermittently, but the Zabbix server's SO stops responding to commands after a few hours, with a "bash: fork: retry: No child processes" error every time I try even a simple "ls", forcing me to reboot the server. While investigating I found the following entries in the zabbix_java_gateway.log file:
2019-02-12 14:58:11.435 [pool-1-thread-17] DEBUG com.zabbix.gateway.ItemChecker - caught exception for item 'jmx["java.lang:type=ClassLoading",TotalLoadedClass Coun t]'
org.jboss.remoting3.NotOpenException: Writes closed
at org.jboss.remoting3.remote.RemoteConnectionChannel .openOutboundMessage(RemoteConnectionChannel.java: 115) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at org.jboss.remoting3.remote.RemoteConnectionChannel .writeMessage(RemoteConnectionChannel.java:307) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at org.jboss.remotingjmx.protocol.v2.Common.write(Com mon.java:180) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at org.jboss.remotingjmx.protocol.v2.ClientConnection $TheConnection.getAttribute(ClientConnection.java: 823) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at com.zabbix.gateway.JMXItemChecker.getStringValue(J MXItemChecker.java:181) [zabbix-java-gateway-4.0.3.jar:na]
at com.zabbix.gateway.ItemChecker.getJSONValue(ItemCh ecker.java:87) ~[zabbix-java-gateway-4.0.3.jar:na]
at com.zabbix.gateway.JMXItemChecker.getValues(JMXIte mChecker.java:103) [zabbix-java-gateway-4.0.3.jar:na]
at com.zabbix.gateway.SocketProcessor.run(SocketProce ssor.java:63) [zabbix-java-gateway-4.0.3.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) [na:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
And I noticed the number of opened files won't stop increasing after I start the zabbbix-java-gateway service, getting to over 50000 in a span of three hours. Running "lsof -u zabbix", this is what comes up, thousands of times:
java 15846 zabbix *911r FIFO 0,10 0t0 334707 pipe
java 15846 zabbix *912w FIFO 0,10 0t0 334707 pipe
java 15846 zabbix *913u a_inode 0,11 0 7269 [eventpoll]
2019-02-12 14:58:11.435 [pool-1-thread-17] DEBUG com.zabbix.gateway.ItemChecker - caught exception for item 'jmx["java.lang:type=ClassLoading",TotalLoadedClass Coun t]'
org.jboss.remoting3.NotOpenException: Writes closed
at org.jboss.remoting3.remote.RemoteConnectionChannel .openOutboundMessage(RemoteConnectionChannel.java: 115) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at org.jboss.remoting3.remote.RemoteConnectionChannel .writeMessage(RemoteConnectionChannel.java:307) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at org.jboss.remotingjmx.protocol.v2.Common.write(Com mon.java:180) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at org.jboss.remotingjmx.protocol.v2.ClientConnection $TheConnection.getAttribute(ClientConnection.java: 823) ~[jboss-client.jar:7.0.4.GA-redhat-2]
at com.zabbix.gateway.JMXItemChecker.getStringValue(J MXItemChecker.java:181) [zabbix-java-gateway-4.0.3.jar:na]
at com.zabbix.gateway.ItemChecker.getJSONValue(ItemCh ecker.java:87) ~[zabbix-java-gateway-4.0.3.jar:na]
at com.zabbix.gateway.JMXItemChecker.getValues(JMXIte mChecker.java:103) [zabbix-java-gateway-4.0.3.jar:na]
at com.zabbix.gateway.SocketProcessor.run(SocketProce ssor.java:63) [zabbix-java-gateway-4.0.3.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) [na:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
And I noticed the number of opened files won't stop increasing after I start the zabbbix-java-gateway service, getting to over 50000 in a span of three hours. Running "lsof -u zabbix", this is what comes up, thousands of times:
java 15846 zabbix *911r FIFO 0,10 0t0 334707 pipe
java 15846 zabbix *912w FIFO 0,10 0t0 334707 pipe
java 15846 zabbix *913u a_inode 0,11 0 7269 [eventpoll]