Ad Widget

Collapse

UserParameter doesn't get timeout, agent doesn't provide data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • just2blue4u
    Senior Member
    • Apr 2006
    • 347

    #1

    UserParameter doesn't get timeout, agent doesn't provide data

    One of our machines ("A") had a kernel panic and crashed. Another host ("B") had mounted a filesystem on that crashed machine. This made "lsof" freeze on "B".

    We have a UserParameter "files.open" in our agents that use lsof:
    Code:
    UserParameter=files.open,/usr/sbin/lsof|grep -v lsof|grep -v grep|wc -l|sed s/" "//g
    Normally, this works quite good. But now, when lsof freezes, there are many processes from user zabbix, that have to do with "files.open".

    The agent's log says:
    Timeout while answering request
    Zabbix server logs said:
    027424:20061204:111220 Timeout while receiving data from [backup-pbs.bfk]
    027424:20061204:111220 Getting value of [proc.num[]] from host [backup-pbs.bfk] failed
    027424:20061204:111220 The value is not stored in database.
    027424:20061204:111225 Timeout while receiving data from [backup-pbs.bfk]
    027424:20061204:111225 Getting value of [net.if.in[eth0]] from host [backup-pbs.bfk] failed
    027424:20061204:111225 The value is not stored in database.
    ...
    There was no data fetched from that client for every item as long as lsof didn't work.

    Can someone please explain to me,
    - why the server got no data from the whole host, and
    - why the processes that got the timeout aren't killed?

    The zabbix agent's version is 1.1.1, server is 1.1.


    Thanks,
    J2B4U
    Big ZABBIX is watching you!
    (... and my 48 hosts, 4513 items, 1280 triggers via zabbix v1.6 on CentOS 5.0)
Working...