I have several render servers that regularily hit peak load averages of 15-30 and become insensible to all stimuli other that what they're working on. When Zabbix tries to contact these systems' (passive) agents, it gets no response for "host name (or data) has changed" and fires a trigger even though no usable data was returned.
Host name:
2011.Oct.11 12:49:27 18.bb
2011.Oct.11 12:19:32 timeout while executing a shell script
2011.Oct.11 11:49:27 18.bbl
I've added a regex to reject "timeout while executing a shell script" for Host name / Host info triggers. Only these 2 triggers are affected. agent.ping and agent.version are both unaffected and even processor load avg[x] reports correctly during timeouts. Can agent be improved to prevent these false positives?
cheers
Host name:
2011.Oct.11 12:49:27 18.bb
2011.Oct.11 12:19:32 timeout while executing a shell script
2011.Oct.11 11:49:27 18.bbl
I've added a regex to reject "timeout while executing a shell script" for Host name / Host info triggers. Only these 2 triggers are affected. agent.ping and agent.version are both unaffected and even processor load avg[x] reports correctly during timeouts. Can agent be improved to prevent these false positives?
cheers