Ad Widget

Collapse

v1.4.2 agent hangs on timed out UserParameter (caused by unreachable NFS mount)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • just2blue4u
    Senior Member
    • Apr 2006
    • 347

    #1

    v1.4.2 agent hangs on timed out UserParameter (caused by unreachable NFS mount)

    (As posted in "call to discussion for 1.6")

    Today our NFS Server was shut down without unmounting it on all NFS Clients. We have a Zabbix UserParameter "files.open", that contains
    Code:
    #UserParameter=files.open,/usr/sbin/lsof|grep -v lsof|grep -v grep|wc -l|sed s/" "//g
    to count all open files on the Client.
    This Command hung on the mounted (and unreachable) NFS dir. That caused our agents to hang, too. After killing the lsof procs and restarting the agents, everything was fine again (deactivated the UserParameter line in config).

    I'd really like to see Zabbix killing alle UserParameters that time out or at least being more immune/resistent/independent to hanging UserParameters.

    (Issue seen on Server 1.4.4, agent 1.4.2)
    Last edited by just2blue4u; 15-07-2008, 14:18.
    Big ZABBIX is watching you!
    (... and my 48 hosts, 4513 items, 1280 triggers via zabbix v1.6 on CentOS 5.0)
  • just2blue4u
    Senior Member
    • Apr 2006
    • 347

    #2
    Originally posted by richlv
    first, no, you are not the only one
    second, that's a hard nfs mount. that's just how they behave.
    you could use softmounts, but things can break badly if you do.
    i'm not sure how much would some timeouts help in case of zabbix agent - there are very few, if any, applications that behave really well with stalled nfs mounts. on the positive note, all apps resume nicely when nfs connection is resumed
    Thanks for your Answer!
    I know NFS mounts are a really tricky thing. But i don't have the problem with my nfs, it's the fact that the complete agent hangs if one UserParameter does so...
    Big ZABBIX is watching you!
    (... and my 48 hosts, 4513 items, 1280 triggers via zabbix v1.6 on CentOS 5.0)

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #3
      ok, that's a reasonable request, i think. though it would still require timeouting of such checks, otherwise pooling them in the background could result in a lot of 'hung' checks.
      Zabbix 3.0 Network Monitoring book

      Comment

      Working...