Ad Widget

Collapse

Active agents stop sending data (1.1.6)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • glut0r
    Member
    • Mar 2007
    • 38

    #16
    ah nice one I got here,

    Code:
    026507:20070505:121137 Error gethostbyname, can not resolve [zabbix...pl]
    026507:20070505:121139 Connection from [87...] rejected. Allowed server is [zabbix...pl,zabbix-2...pl,localhost]
    026508:20070505:153330 gethostbyname() failed [Host name lookup failure]
    026508:20070505:153330 Getting list of active checks failed. Will retry after 60 seconds
    026505:20070507:163913 Error gethostbyname, can not resolve [zabbix...pl]
    026504:20070507:163921 Error gethostbyname, can not resolve [zabbix...pl]
    026505:20070507:163921 Connection from [87...] rejected. Allowed server is [zabbix...pl,zabbix-2...pl,localhost]
    026504:20070507:163921 Connection from [87...] rejected. Allowed server is [zabbix...pl,zabbix-2...pl,localhost]
    026507:20070507:185246 Error gethostbyname, can not resolve [zabbix...pl]
    026507:20070507:185301 Connection from [87...] rejected. Allowed server is [zabbix...pl,zabbix-2...pl,localhost]
    then there's only grey appearing in monitoring overview.

    so, my suspection is, that when dns dies, zabbix agent cannot resolve server, and somehow it dies.

    Seems repeatable. No dns or high latency, and agent hangs. Can anyone confirm ?

    HELP! Alexiej where are you! Hear us calling from the depths!

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #17
      It is so obvious that ZABBIX is unable to determine IP adress by DNS name if DNS server is down. Does it require further explanation?
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • glut0r
        Member
        • Mar 2007
        • 38

        #18
        Originally posted by Alexei
        It is so obvious that ZABBIX is unable to determine IP adress by DNS name if DNS server is down. Does it require further explanation?
        Of course it does, why it did't come back after dns was restored?
        Routers flaps from time to time. Agents have to return withouth restarting them.

        Comment

        • glut0r
          Member
          • Mar 2007
          • 38

          #19
          Captain's log, star date 1234

          Added static entry to /etc/hosts, now the only entry that is appearing in agent's log before it stops sending data to server is this:

          Code:
          019917:20070511:152436 Error in connect() [zabbix....:10051] [Connectio
          n timed out]
          I found out that what causes this is routing loops and packet loss above, say 20%. Agent hangs then. There's something wrong with errors handling imho.

          Comment

          • cpicton
            Member
            • Nov 2006
            • 35

            #20
            I am experiencing the same problem with active agents on 1.4

            The agent sometimes hangs when trying to send data during a network outage.

            When the network connectivity is restored, the agent stays in the hung state. Only restarting the agent fixes this. The agent *should* resume sending once the network is restored. I am sending to an IP address, not a hostname, so DNS is not the problem.

            I am currently running with debuglevel=5 to try get some logs of when this happens

            Once I have the logs I will post them here

            Comment

            • bbrendon
              Senior Member
              • Sep 2005
              • 870

              #21
              What ever happened with this?

              I just had a 1.1.6 agent on Windows require a restart.

              In the agent log is says:

              [09-Aug-2007 11:27:07] Active checks [Error in connect()]

              The last information in the zabbix database for the host had a time of 11:30

              Once the agent was restarted, everything was fine and dandy.
              Unofficial Zabbix Expert
              Blog, Corporate Site

              Comment

              • [cc]smart
                Junior Member
                • May 2005
                • 25

                #22
                Can see the effect with ZABBIX Agent (daemon) v1.4.2 (20 August 2007) server ZABBIX 1.4.2 .In my case it's annoying due to the need to use active checks (natted one way connection). Once clients using dynamic IP change that IP they stop sending data. (that IP is an external IP the clients don't know at all about, but a network interruption is involved due to that.)

                Comment

                • Niels
                  Senior Member
                  • May 2007
                  • 239

                  #23
                  I have this problem too. I have two remote machines: A (Win 2003 Small Business) and B (Win 2003 Enterprise). They are on separate locations, and have different loads and configurations. However, the agent on A has never stopped sending data, while the one on B has done so several times. Both agents are 1.4.2, and all items are Active Agent. There are no hints in the log on B as to what's going on.

                  Comment

                  • steria
                    Junior Member
                    • Jun 2007
                    • 17

                    #24
                    Same problem here!
                    Yesterday, I had to reboot my zabbix server (1.4.2) and since that, all client data collection is stopped (configured in check active).

                    Is there a solution for that problem? Don't tell me that I have to restart agent on all my clients (about 150)....


                    Thank you.

                    Comment

                    • bgerhardt
                      Junior Member
                      • Jun 2007
                      • 2

                      #25
                      Never ending story

                      I started using zabbix over the summer and have had the same problem in both 1.1, 1.4, 1.4.1, ... 95% of my 100+ linux hosts are natted in remote locations where I may or may not have physical and/or ssh access. In the avg. week several of the hosts will hang and never recover without restart.

                      The feeling I had for a long time was that zabbix doesn't handle a half closed socket properly.
                      Last edited by bgerhardt; 17-12-2007, 17:35.

                      Comment

                      • richlv
                        Senior Member
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Oct 2005
                        • 3112

                        #26
                        see revisions 5150, 5152 and 5153 for stable branch and rev 5155 for trunk.

                        r5153 | sasha | 2007-12-10 17:26:32 +0200 (Mon, 10 Dec 2007) | 2 lines
                        - [ZBX-192] Data collection stopped after connection loss

                        Zabbix 3.0 Network Monitoring book

                        Comment

                        • Niels
                          Senior Member
                          • May 2007
                          • 239

                          #27
                          OK, my bad: The bug is marked as resolved, but there's no explanation or revision reference in the bugtracker.

                          Ah!: In the bug tracker, the comments area can also show "all" and "change history".
                          Last edited by Niels; 11-12-2007, 11:01.

                          Comment

                          • richlv
                            Senior Member
                            Zabbix Certified Trainer
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Oct 2005
                            • 3112

                            #28
                            well, yes, adding message with revision would be helpful (other projects have this automated where adding BUG keyword in svn message automatically adds message to the bug and closes the bug).
                            Zabbix 3.0 Network Monitoring book

                            Comment

                            • richlv
                              Senior Member
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Oct 2005
                              • 3112

                              #29
                              btw, i've seen some reports get nice svn commit information with log message and changed files, so it should be easier to track issues.
                              though it probably depends on developers using the feature in all cases
                              Zabbix 3.0 Network Monitoring book

                              Comment

                              • pasqu
                                Junior Member
                                • Jun 2007
                                • 29

                                #30
                                I have this issue also with 1.8.4 version.
                                I see that clock change (with ntpdate for example) lock the agent active check.

                                Comment

                                Working...