PDA

View Full Version : alpha9: active checks: Connection refused


Andre
26-05-2005, 11:25
Hi,
Working on alpha9, I'm stumbling over nasty messages in the agentd.log file:

009813:20050525:172134 Cannot connect to [10.208.230.75] [Connection refused]
009813:20050525:172134 Getting list of active checks failed. Will retry after 60 seconds

The server (linux) is listening on the zabbix ports, the client (Linux) agentd works fine for all functions but for these "active checks".
Name resolution is fine, in both directions.
B.T.W.: the server's local zabbix_agentd doesn't "active-check" neither.

I know, "active checks" is a new feature in alpha9. Unfortunately I haven't the time to go further through the source code (active.c line 215, zabbix_agentd.c lines 485++).

Reconfiguring /etc/zabbix/agend.conf doesn't help, and doesn't affect "active checks" behaviour.

I will continue with alpha8 for now. Hey, Zabbix is really an excellent master piece of a solution. We like it. Thank you. Congratulations!!

Regards,
Andre

Alexei
26-05-2005, 11:28
The messages means that the agent is unable to connect to ZABBIX server. Obviously it tries to connect to port 10051.

Do telnet server 10051 from the agent's host to see what's wrong.

Andre
26-05-2005, 13:04
Hello Alexei;
thank you for your hint!
I tried it, and .. the server connects. Of course, the server times out after a few seconds.
See the following dialog:

kiepea@fatcow:~> telnet argus 10051
Trying 10.208.230.75...
Connected to argus.
Escape character is '^]'.
Connection closed by foreign host.
kiepea@fatcow:~>

fatcow is the machine with the agentd, argus is the zabbix server.
Should I try to dig on the server side?
Regards,
Andre

habbers
13-06-2005, 16:16
I was having the same problem due to port 10051 not being open on the firewall of the server. However when I openned the firewall port to allow the agent on the client to connect the agent kept shutting down. This is what I got in the agentd.log file with debug 4

012678:20050613:124601 After read() 2 [15]
012678:20050613:124601 Got line:diskfree[/usr]
012678:20050613:124601 Sending back:228116.000000
012681:20050613:124601 Sending [ZBX_GET_ACTIVE_CHECKS
ngogeeks.com
]
012681:20050613:124601 Before read
012681:20050613:124601 Read [NOT OK
]
012681:20050613:124601 In delete_all_metrics()
012681:20050613:124601 Parsed [NOT OK]

I am using the 1.1alpha10 version of the agent program connecting to a server running the 1.1alpha7 version. Would that be the most likely reason that the agentd is having problems and shutting down?

The agentd running on the server itself appears to be working fine.

mconigliaro
25-10-2005, 23:18
im having this same problem in 1.1beta1, but port 10051 is not open on my server. how do i open it? i was under the impression that the zabbix_server was responsible for accepting connections on this port. do i need another server process (ie: zabbix_trapper) to enable active checks? if so, is there some documentation on this somewhere? thanks in advance.

James Wells
25-10-2005, 23:43
Greetings,

In your zabbix_server.conf file, you should have an entry like this;
ListenPort=10051

This specifies the port that the Zabbix server listens to for agent (active) requests.


kiepea@fatcow:~> telnet argus 10051
Trying 10.208.230.75...
Connected to argus.
Escape character is '^]'.
Connection closed by foreign host.
Looks good, that means that your server is listening correctly, however, your agents will not work if they are configured with the server name instead of the server IP address. Based on the IP address you are showing here, your zabbix_agentd.conf file should contain the the following entry;
Server=10.208.230.75
You can put other servers after this one on the same config line, seperated by comma's, however, this one must be the first on the line.

Additionally, once a connection is made, your server will wait for a number of seconds equal to the value of timeout, as set in the zabbix_server.conf file before it closes the connection.

omenix
08-12-2005, 23:29
In my case I got this error log and Im using beta2

029651:20051208:162827 Sending [ZBX_GET_ACTIVE_CHECKS
localhost
]
029651:20051208:162827 Before read
029651:20051208:162827 Connection reset by peer.
029651:20051208:162827 Getting list of active checks failed. Will retry after 60 seconds

mconigliaro
07-03-2006, 20:08
im getting the same error as omenix. my server is definately listening now, and my agents seem to be connecting to the correct server (according to the logs), but ive never been able to get active checks to work.

im currently using 1.1beta7.

mconigliaro
07-03-2006, 21:34
telnet 10.120.120.201 10051
Trying 10.120.120.201...
Connected to 10.120.120.201.
Escape character is '^]'.

Connection closed by foreign host.


it seems that no matter what i type when im connected, the server disconnects me. is this normal behavior? im also curious as to how i can send the ZBX_GET_ACTIVE_CHECKS command manually through the telnet session. it seems like it needs to be on one line, because as soon as i hit enter, i get disconnected. i tried the following strings (because i couldnt find any documentation on the proper syntax), but nothing worked.

this first one caused an error on the server: "ZBX_GET_ACTIVE_CHECKS: host is null. Ignoring."


ZBX_GET_ACTIVE_CHECKS hostname


next i tried putting brackets around the whole thing, because thats how its logged in the zabbix_agentd.log file. this didnt seem to do anything though.


[ ZBX_GET_ACTIVE_CHECKS hostname ]


im pretty much baffled at this point.

mconigliaro
07-03-2006, 22:28
ok, so the server is clearly doing the select and at least thinks its sending active checks to the agent. the problem is that the agent never recieves the list for some reason. heres a relevant part of the log from my server.


025164:20060307:161415 Got line:ZBX_GET_ACTIVE_CHECKS
hostname
025164:20060307:161415 Trapper got [ZBX_GET_ACTIVE_CHECKS
hostname]
025164:20060307:161415 In autoregister(hostname)
025164:20060307:161415 Executing query:select hostid from hosts where host='hostname'
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [1]
025164:20060307:161415 Host [hostname] already exists. Do nothing.
025164:20060307:161415 Host already exists [hostname]
025164:20060307:161415 In send_list_of_active_checks()
025164:20060307:161415 Executing query:select i.key_,i.delay,i.lastlogsize from items i,hosts h where i.hostid=h.hostid and h.status=0 and i.status=0 and i.type=7 and h.host='hostname'
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [agent.ping:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [agent.version:3600:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [net.if.in[eth0]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [net.if.in[eth1]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [net.if.out[eth0]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [net.if.out[eth1]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [proc.num[cron]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [proc.num[]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [system.cpu.load[all,avg5]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [system.swap.size[all,free]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [system.uname:3600:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [system.uptime:300:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [system.users.num:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [vfs.fs.size[/,free]:60:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [vfs.fs.size[/home,free]:60:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025163:20060307:161415 Sending [swap[free]
]
025164:20060307:161415 Sending [vfs.fs.size[/opt,free]:60:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [vfs.fs.size[/tmp,free]:60:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [vfs.fs.size[/usr,free]:60:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [vfs.fs.size[/var,free]:60:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [vm.memory.size[free]:30:0
]
025164:20060307:161415 In DBnum_rows
025164:20060307:161415 Result of DBnum_rows [20]
025164:20060307:161415 Sending [ZBX_EOF
]