(As posted in "call to discussion for 1.6")
Today our NFS Server was shut down without unmounting it on all NFS Clients. We have a Zabbix UserParameter "files.open", that contains
to count all open files on the Client.
This Command hung on the mounted (and unreachable) NFS dir. That caused our agents to hang, too. After killing the lsof procs and restarting the agents, everything was fine again (deactivated the UserParameter line in config).
I'd really like to see Zabbix killing alle UserParameters that time out or at least being more immune/resistent/independent to hanging UserParameters.
(Issue seen on Server 1.4.4, agent 1.4.2)
Today our NFS Server was shut down without unmounting it on all NFS Clients. We have a Zabbix UserParameter "files.open", that contains
Code:
#UserParameter=files.open,/usr/sbin/lsof|grep -v lsof|grep -v grep|wc -l|sed s/" "//g
This Command hung on the mounted (and unreachable) NFS dir. That caused our agents to hang, too. After killing the lsof procs and restarting the agents, everything was fine again (deactivated the UserParameter line in config).
I'd really like to see Zabbix killing alle UserParameters that time out or at least being more immune/resistent/independent to hanging UserParameters.
(Issue seen on Server 1.4.4, agent 1.4.2)



Comment