Hello,
we're running Zabbix 1.6 SVN rev. 6204. Underlying system is CentOS 5.2, database backend PostgreSQL 8.3 on a 16GB of RAM, 2x quad core Opteron processors machine. We're monitoring around 450 machines with ~ 40000 items and ~ 4000 triggers (new values per second: 230). Agents are mainly version 1.4.5.
Zabbix server log reports couple of times (~5 in average) a day message:
Got SIGPIPE. Where it came from???
Error while sending list of active checks
but nothing bad happens.
However, on two occasions number of these messages increases and finally Zabbix server stops receiving any results from agent. On the agent side we see following messages:
Timeout while answering request
Getting list of active checks failed. Will retry after 60 seconds
Since we're using nodata trigger to raise alert for Zabbix agent down this issue is causing a lot of false positives. Could anyone provide us with some clue where are these SIGPIPEs coming from and how to avoid them.
Thanks in advance,
emir
we're running Zabbix 1.6 SVN rev. 6204. Underlying system is CentOS 5.2, database backend PostgreSQL 8.3 on a 16GB of RAM, 2x quad core Opteron processors machine. We're monitoring around 450 machines with ~ 40000 items and ~ 4000 triggers (new values per second: 230). Agents are mainly version 1.4.5.
Zabbix server log reports couple of times (~5 in average) a day message:
Got SIGPIPE. Where it came from???
Error while sending list of active checks
but nothing bad happens.
However, on two occasions number of these messages increases and finally Zabbix server stops receiving any results from agent. On the agent side we see following messages:
Timeout while answering request
Getting list of active checks failed. Will retry after 60 seconds
Since we're using nodata trigger to raise alert for Zabbix agent down this issue is causing a lot of false positives. Could anyone provide us with some clue where are these SIGPIPEs coming from and how to avoid them.
Thanks in advance,
emir

Comment