Ad Widget

Collapse

zabbix traper - Connection reset by peer

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nax
    Junior Member
    • Jan 2012
    • 6

    #1

    zabbix traper - Connection reset by peer

    We are running about 150 custom home grown agents (written in Perl) sending data to Zabbix server (v1.8.3). Each of agents send data about hundreds apps running on the hosts - in total number of items monitored is 46511.

    On agent side I can see following error:
    "agent encountered server error 104 during read: Connection reset by peer"

    Other message is either
    "agent could not connect to Zabbix server zabbix01:10051"

    or

    "agent encountered server error 0 during read:"

    You can see on the graph that there are constantly some connections refused and some resets sent.

    I tried to increase number of trappers (to 100) and I also increased number of allowed DB connections (to 300). We are using PostgreSQL 9.0.1 running locally on the machine.

    As result of those problems, we not all the data make it to the database. There are gaps for all the apps. Randomly either few minutes or hours.

    Any idea what might be a reason such problems? What can I try to improve performance of our zabbix setup?
    Attached Files
    Last edited by nax; 10-11-2012, 12:05.
  • nax
    Junior Member
    • Jan 2012
    • 6

    #2
    solution

    Seems that I found a problem. All the agent were synced in sending data and server cannot cope with so many incoming TCP connections. What happen in that case that there is buffer for received SYN packets and those which wasn't yet served (ack wasn't send back). There can be 128 of such connection in queue by default.


    You can clearly see it on a graph below. There is significant amount of connections reset sent because this buffer run over.

    I setup external check for this with interval 5s.

    Code:
    netstat -s | grep -A 30 -i "tcpExt:" | grep -i "acknowledgments not containing data received" | awk '{print $1}'
    netstat -n | fgrep SYN_RECV | wc -l
    netstat -s | grep -A 30 -i "tcpExt:" | grep -i "connections reset due to unexpected data" | awk '{print $1}'
    netstat -s | grep -A 30 -i "tcpExt:" | grep -i "times the listen queue of a socket overflowed" | awk '{print $1}'
    Attached Files

    Comment

    Working...