Ad Widget

Collapse

Error: ZBX_TCP_READ() failed

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lorty
    Junior Member
    • Dec 2007
    • 1

    #1

    Error: ZBX_TCP_READ() failed

    Hi all,
    I just started using Zabbix 1.4.2 and I am monitoring 10 hosts; I compiled everything from source.
    Zabbix server runs on ubuntu 6.10.
    Problem is that every other day one of the hosts become unreachable, on the server the error message is:
    "Error: ZBX_TCP_READ() failed interrupted system call" it appears in the logs and in the hosts configuration web-page.
    The agentd on the unreachable machine seems to be running fine: ps shows the processes running, no error in the logs, but if I restart the agent then the host become reachable again.
    Please help me with this problem, what can cause it?
    It barely makes my zabbix installation useless :-(
  • Niels
    Senior Member
    • May 2007
    • 239

    #2
    This is a well known bug in Zabbix:




    You could try using active items, but it may not help.

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      I believe the problem is already fixed in 1.4.3.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • radamand
        Member
        • Aug 2008
        • 89

        #4
        I am running 1.4.5 and it is not fixed, having the same problem on 7 or 8 of my ~60 hosts...

        Comment

        • Enric
          Junior Member
          • Sep 2008
          • 2

          #5
          Me too

          I've the same problem,

          I'm using zabbix server 1.4.6, and zabbix agent 1.1.4 to 1.4.6 (it depends on the machine). Worse is that the problem repeats every single day particularly a moments later after I shut down the zabbix monitoring screen at nigh.

          I'm running zabbix on a vmware virtual machine with Ubuntu 8.04, (Xeon 3 Ghz. processor, 256 Mb Ram, 8 Gb hard disk and external MySql Server on a high capacity machine)

          I'm monitoring now 40 machines (both physical and virtual) with 8200 items right now (zabbix and snmp) and I'm affraid of adding more until I can stabilize the whole thing.

          It does start with a few connections losed, and at the same time the zabbix-server starts to create more processes at the end all items monitored stops sending data, the zabbix-server ocassionally dies too.

          Restarting the agent in the client machines solves the problem, restarting the server also helps too.

          Please could anyone give me a clue on the problem and will try to devote time to fully understand it to create a solution or a hack that solves the problem. I still have 3 work days on my monitoring agenda that I can use to try to solve this.

          Thanks for nay answer.

          Enric

          Comment

          • radamand
            Member
            • Aug 2008
            • 89

            #6
            Can we get an update on this?

            Comment

            • Enric
              Junior Member
              • Sep 2008
              • 2

              #7
              I've almost solved my problem.

              I cleaned all item used in tests and all triggers and almost anything not template based. Only left alive specific actions and triggers defined.

              Now my server almost never dies, ok it ocassionally does but it is running over a crowded mysql engine (> 2000 qps) and don't know if its really a zabbix fault, anyways it dies less than 1 time a week so I've put a restart wrapper around the zabbix server and I just notice when it dies for the logs.

              The agents are another issue, they work, but from time to time (usually with periods of high processor load (> 5) ) they stop colecting data and they refuse to do it automatically (when asked by the server) until restarted, by hand i.e. "telnet host 10050" they answer correctly.

              Maybe there are better ways to solve those issues but my zabbix knowledge is quite limited and learning every day, maybe it's a program problem or maybe its up to my install, anyways now it's "almost" running. BTW there are other complains here in the forums from people with similar problems and no one seems to fully understand what's wrong.

              Comment

              • radamand
                Member
                • Aug 2008
                • 89

                #8
                All I see in the client logs is;

                14203:20080915:090106 zabbix_agentd started. ZABBIX 1.4.4.
                14205:20080915:090106 zabbix_agentd collector started
                14206:20080915:090106 zabbix_agentd listener started
                14207:20080915:090106 zabbix_agentd listener started
                14208:20080915:090106 zabbix_agentd listener started
                14209:20080915:090106 zabbix_agentd active check started [68.87.80.43:10051]
                14209:20080915:210757 Timeout while answering request
                14209:20080915:210757 Getting list of active checks failed. Will retry after 60 seconds
                14209:20080915:212929 Timeout while answering request
                14209:20080915:212929 Getting list of active checks failed. Will retry after 60 seconds
                14209:20080915:214904 Timeout while answering request
                14209:20080915:214904 Getting list of active checks failed. Will retry after 60 seconds

                This host is currently only setup with a (Simple) network ping and an (agent) agent ping. no other items or triggers.

                The relevant server log entries are;

                15493:20080915:210321 Host [pltncavsm07]: first network error, wait for 15 seconds
                15493:20080915:210321 Parameter [agent.ping] will be checked after 120 seconds on host [pltncavsm07]
                .
                (message repeats several times)
                .
                15538:20080916:023656 Timeout while answering request
                15538:20080916:023656 Get value from agent failed. Error: ZBX_TCP_READ() failed [Interrupted system call]
                15538:20080916:023656 Host [pltncavsm07] will be checked after 60 seconds
                .
                .
                (message repeats many times, I would assume every 60 seconds... timestamped log entries would be nice)
                Last edited by radamand; 16-09-2008, 14:50.

                Comment

                • Antras
                  Junior Member
                  • Oct 2007
                  • 12

                  #9
                  Did you changed parameter StartAgents in the zabbix_agent.conf.
                  I had the same problem, if StartAgents was 2. When i set it to 5 (default), everything began to work.

                  Comment

                  • radamand
                    Member
                    • Aug 2008
                    • 89

                    #10
                    Still having the same problem..............

                    an update/response from the devs would really be nice.........

                    Comment

                    • radamand
                      Member
                      • Aug 2008
                      • 89

                      #11
                      Would really, really, REALLY be nice.................

                      Comment

                      • nelsonab
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Sep 2006
                        • 1233

                        #12
                        Originally posted by radamand
                        All I see in the client logs is;

                        14203:20080915:090106 zabbix_agentd started. ZABBIX 1.4.4.
                        14205:20080915:090106 zabbix_agentd collector started
                        14206:20080915:090106 zabbix_agentd listener started
                        14207:20080915:090106 zabbix_agentd listener started
                        14208:20080915:090106 zabbix_agentd listener started
                        14209:20080915:090106 zabbix_agentd active check started [68.87.80.43:10051]
                        14209:20080915:210757 Timeout while answering request
                        14209:20080915:210757 Getting list of active checks failed. Will retry after 60 seconds
                        14209:20080915:212929 Timeout while answering request
                        14209:20080915:212929 Getting list of active checks failed. Will retry after 60 seconds
                        14209:20080915:214904 Timeout while answering request
                        14209:20080915:214904 Getting list of active checks failed. Will retry after 60 seconds

                        This host is currently only setup with a (Simple) network ping and an (agent) agent ping. no other items or triggers.

                        The relevant server log entries are;

                        15493:20080915:210321 Host [pltncavsm07]: first network error, wait for 15 seconds
                        15493:20080915:210321 Parameter [agent.ping] will be checked after 120 seconds on host [pltncavsm07]
                        .
                        (message repeats several times)
                        .
                        15538:20080916:023656 Timeout while answering request
                        15538:20080916:023656 Get value from agent failed. Error: ZBX_TCP_READ() failed [Interrupted system call]
                        15538:20080916:023656 Host [pltncavsm07] will be checked after 60 seconds
                        .
                        .
                        (message repeats many times, I would assume every 60 seconds... timestamped log entries would be nice)
                        Check this thread:


                        Try the steps outlined half way through to test the problem. It looks like you may have a network configuration problem.
                        RHCE, author of zbxapi
                        Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                        Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                        Comment

                        • bashman
                          Senior Member
                          • Dec 2009
                          • 432

                          #13
                          I still have the same problem on version 1.8.1:

                          Code:
                          26418:20100408:082454.949 Item [<hostname>:xxx] error: Get value from agent failed: ZBX_TCP_READ() failed [Connection reset by peer]
                          978 Hosts / 16.901 Items / 8.703 Triggers / 44 usr / 90,59 nvps / v1.8.15

                          Comment

                          • engineer
                            Junior Member
                            • Mar 2010
                            • 25

                            #14
                            Unfortunatley I'm moving over from Zabbix to System Center Operations Manager. Zabbix just doesn't cut it as a high end monitoring solution.

                            Zabbix is down 20% of the time, and for a high end monitoring company, this just isn't good enough.

                            I've posted several threads on the forums, but never got any response.

                            unfortunatley until 350 servers have SCOM agent installed, I still have to bang my head against the wall with Zabbix.

                            good luck though, would be nice if it was resolved.

                            I am considering upgrading to 1.8.2, but hesitent as other users indicate its even worse.

                            Comment

                            • bashman
                              Senior Member
                              • Dec 2009
                              • 432

                              #15
                              Also I get this zabbix_server log entries for SNMP checks:

                              Code:
                               22874:20100409:104940.958 Item [<hostname>:xxx] error: Timeout while connecting to [<IP>:<Port>]
                               22874:20100409:104940.978 SNMP Host [<hostname>]: another network error, wait for 15 seconds
                              And this zabbix_server log entry for zabbix server monitoring:

                              Code:
                               22899:20100409:104950.181 Sending list of active checks to [<IP>] failed: host [localhost] not found
                              Thanks for your response.
                              978 Hosts / 16.901 Items / 8.703 Triggers / 44 usr / 90,59 nvps / v1.8.15

                              Comment

                              Working...