Ad Widget

Collapse

alpha9: active checks: Connection refused

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Andre
    Junior Member
    • May 2005
    • 5

    #1

    alpha9: active checks: Connection refused

    Hi,
    Working on alpha9, I'm stumbling over nasty messages in the agentd.log file:

    009813:20050525:172134 Cannot connect to [10.208.230.75] [Connection refused]
    009813:20050525:172134 Getting list of active checks failed. Will retry after 60 seconds

    The server (linux) is listening on the zabbix ports, the client (Linux) agentd works fine for all functions but for these "active checks".
    Name resolution is fine, in both directions.
    B.T.W.: the server's local zabbix_agentd doesn't "active-check" neither.

    I know, "active checks" is a new feature in alpha9. Unfortunately I haven't the time to go further through the source code (active.c line 215, zabbix_agentd.c lines 485++).

    Reconfiguring /etc/zabbix/agend.conf doesn't help, and doesn't affect "active checks" behaviour.

    I will continue with alpha8 for now. Hey, Zabbix is really an excellent master piece of a solution. We like it. Thank you. Congratulations!!

    Regards,
    Andre
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    The messages means that the agent is unable to connect to ZABBIX server. Obviously it tries to connect to port 10051.

    Do telnet server 10051 from the agent's host to see what's wrong.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • Andre
      Junior Member
      • May 2005
      • 5

      #3
      Hello Alexei;
      thank you for your hint!
      I tried it, and .. the server connects. Of course, the server times out after a few seconds.
      See the following dialog:

      kiepea@fatcow:~> telnet argus 10051
      Trying 10.208.230.75...
      Connected to argus.
      Escape character is '^]'.
      Connection closed by foreign host.
      kiepea@fatcow:~>

      fatcow is the machine with the agentd, argus is the zabbix server.
      Should I try to dig on the server side?
      Regards,
      Andre

      Comment

      • habbers
        Junior Member
        • May 2005
        • 6

        #4
        I was having the same problem due to port 10051 not being open on the firewall of the server. However when I openned the firewall port to allow the agent on the client to connect the agent kept shutting down. This is what I got in the agentd.log file with debug 4

        012678:20050613:124601 After read() 2 [15]
        012678:20050613:124601 Got line:diskfree[/usr]
        012678:20050613:124601 Sending back:228116.000000
        012681:20050613:124601 Sending [ZBX_GET_ACTIVE_CHECKS
        ngogeeks.com
        ]
        012681:20050613:124601 Before read
        012681:20050613:124601 Read [NOT OK
        ]
        012681:20050613:124601 In delete_all_metrics()
        012681:20050613:124601 Parsed [NOT OK]

        I am using the 1.1alpha10 version of the agent program connecting to a server running the 1.1alpha7 version. Would that be the most likely reason that the agentd is having problems and shutting down?

        The agentd running on the server itself appears to be working fine.

        Comment

        • mconigliaro
          Senior Member
          • Jun 2005
          • 116

          #5
          im having this same problem in 1.1beta1, but port 10051 is not open on my server. how do i open it? i was under the impression that the zabbix_server was responsible for accepting connections on this port. do i need another server process (ie: zabbix_trapper) to enable active checks? if so, is there some documentation on this somewhere? thanks in advance.

          Comment

          • James Wells
            Senior Member
            • Jun 2005
            • 664

            #6
            Greetings,

            In your zabbix_server.conf file, you should have an entry like this;
            Code:
            ListenPort=10051
            This specifies the port that the Zabbix server listens to for agent (active) requests.

            Originally posted by Andre
            Code:
            kiepea@fatcow:~> telnet argus 10051
            Trying 10.208.230.75...
            Connected to argus.
            Escape character is '^]'.
            Connection closed by foreign host.
            Looks good, that means that your server is listening correctly, however, your agents will not work if they are configured with the server name instead of the server IP address. Based on the IP address you are showing here, your zabbix_agentd.conf file should contain the the following entry;
            Code:
            Server=10.208.230.75
            You can put other servers after this one on the same config line, seperated by comma's, however, this one must be the first on the line.

            Additionally, once a connection is made, your server will wait for a number of seconds equal to the value of timeout, as set in the zabbix_server.conf file before it closes the connection.
            Unofficial Zabbix Developer

            Comment

            • omenix
              Junior Member
              • Dec 2005
              • 14

              #7
              In my case I got this error log and Im using beta2

              029651:20051208:162827 Sending [ZBX_GET_ACTIVE_CHECKS
              localhost
              ]
              029651:20051208:162827 Before read
              029651:20051208:162827 Connection reset by peer.
              029651:20051208:162827 Getting list of active checks failed. Will retry after 60 seconds

              Comment

              • mconigliaro
                Senior Member
                • Jun 2005
                • 116

                #8
                im getting the same error as omenix. my server is definately listening now, and my agents seem to be connecting to the correct server (according to the logs), but ive never been able to get active checks to work.

                im currently using 1.1beta7.

                Comment

                • mconigliaro
                  Senior Member
                  • Jun 2005
                  • 116

                  #9
                  Code:
                  telnet 10.120.120.201 10051
                  Trying 10.120.120.201...
                  Connected to 10.120.120.201.
                  Escape character is '^]'.
                  
                  Connection closed by foreign host.
                  it seems that no matter what i type when im connected, the server disconnects me. is this normal behavior? im also curious as to how i can send the ZBX_GET_ACTIVE_CHECKS command manually through the telnet session. it seems like it needs to be on one line, because as soon as i hit enter, i get disconnected. i tried the following strings (because i couldnt find any documentation on the proper syntax), but nothing worked.

                  this first one caused an error on the server: "ZBX_GET_ACTIVE_CHECKS: host is null. Ignoring."

                  Code:
                  ZBX_GET_ACTIVE_CHECKS hostname
                  next i tried putting brackets around the whole thing, because thats how its logged in the zabbix_agentd.log file. this didnt seem to do anything though.

                  Code:
                  [ ZBX_GET_ACTIVE_CHECKS hostname ]
                  im pretty much baffled at this point.

                  Comment

                  • mconigliaro
                    Senior Member
                    • Jun 2005
                    • 116

                    #10
                    ok, so the server is clearly doing the select and at least thinks its sending active checks to the agent. the problem is that the agent never recieves the list for some reason. heres a relevant part of the log from my server.

                    Code:
                    025164:20060307:161415 Got line:ZBX_GET_ACTIVE_CHECKS
                    hostname
                    025164:20060307:161415 Trapper got [ZBX_GET_ACTIVE_CHECKS
                    hostname]
                    025164:20060307:161415 In autoregister(hostname)
                    025164:20060307:161415 Executing query:select hostid from hosts where host='hostname'
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [1]
                    025164:20060307:161415 Host [hostname] already exists. Do nothing.
                    025164:20060307:161415 Host already exists [hostname]
                    025164:20060307:161415 In send_list_of_active_checks()
                    025164:20060307:161415 Executing query:select i.key_,i.delay,i.lastlogsize from items i,hosts h where i.hostid=h.hostid and h.status=0 and i.status=0 and i.type=7 and h.host='hostname'
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [agent.ping:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [agent.version:3600:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [net.if.in[eth0]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [net.if.in[eth1]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [net.if.out[eth0]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [net.if.out[eth1]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [proc.num[cron]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [proc.num[]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [system.cpu.load[all,avg5]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [system.swap.size[all,free]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [system.uname:3600:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [system.uptime:300:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [system.users.num:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [vfs.fs.size[/,free]:60:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [vfs.fs.size[/home,free]:60:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025163:20060307:161415 Sending [swap[free]
                    ]
                    025164:20060307:161415 Sending [vfs.fs.size[/opt,free]:60:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [vfs.fs.size[/tmp,free]:60:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [vfs.fs.size[/usr,free]:60:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [vfs.fs.size[/var,free]:60:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [vm.memory.size[free]:30:0
                    ]
                    025164:20060307:161415 In DBnum_rows
                    025164:20060307:161415 Result of DBnum_rows [20]
                    025164:20060307:161415 Sending [ZBX_EOF
                    ]

                    Comment

                    Working...