PDA

View Full Version : Many TIME_WAIT connection


bee
06-11-2007, 10:52
Hi all,

I saw many (over 50 in total) time_wait connection from zabbix client to zabbix server, as below

TCP zabbixclient:10050 zabbixserver:53944 TIME_WAIT
TCP zabbixclient:10050 zabbixserver:53947 TIME_WAIT
TCP zabbixclient:10050 zabbixserver:53960 TIME_WAIT
....
....
....

is it normal? is one connection above represent one monitored item?

Thanks,
BEE

trikke
07-11-2007, 14:19
I'm having the same "Problem".
Have my zabbix server (zobel) on Solaris, here is what i get with "netstat -a":

zobel.zabbix_agent zobel.52877 49152 0 49152 0 TIME_WAIT
zobel.zabbix_trap SRV-VM-WSUS-01.bvch.ch.1520 65476 0 49640 0 TIME_WAIT
zobel.zabbix_agent zobel.52875 49152 0 49178 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2262 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2330 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2329 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2328 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2327 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2326 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2325 64510 0 49680 0 TIME_WAIT
zobel.zabbix_agent zobel.52829 49152 0 49173 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2324 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2323 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2322 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2321 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2320 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2319 64510 0 49680 0 TIME_WAIT
zobel.52885 ora1LD2P.bvch.ch.12051 49680 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2318 64510 0 49680 0 TIME_WAIT
zobel.zabbix_agent zobel.52884 49152 0 49181 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2317 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2316 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2315 64510 0 49680 0 TIME_WAIT
zobel.52882 ora1LD1P.bvch.ch.12051 49680 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2314 64510 0 49680 0 TIME_WAIT
zobel.zabbix_agent zobel.52881 49152 0 49181 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2313 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2312 64510 0 49680 0 TIME_WAIT
zobel.zabbix_trap SRV-HS-REM-07.bvch.ch.2311 64510 0 49680 0 TIME_WAIT

Some connections were made by zabbix agents ( windows) and i guess it are items of type "zabbix Agent (active)" ( monitoring windows eventlog) and definetly items of type "Zabbix Agent" and key "net.tcp.port"

It looks like the socket opened has not been closed correctly.
Can some developer get a look at it ?

greets
Patrick

Palmertree
08-11-2007, 17:51
Had the same problem but it was an OS issue. Fixed it by modifing the sysctl.conf file as follows:

net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

trikke
09-11-2007, 09:11
Hi P.,

tahnx for the suggestion, I can't seem to find the file on Solaris (on solaris one should use the NDD Command).
Anyway as far as I know it is a application problem. If the application opens a tcp-socket, the application should close the socket as well and not wait for the OS to close the port!!! (See http://rfc.sunsite.dk/rfc/rfc793.html)

Till this issue will be fixed, I think changing the TIME-WAIT interval is a good work-around,

thanx again,
Patrick

Alexei
09-11-2007, 16:55
Anyway as far as I know it is a application problem.
You are wrong :) There is no problem, really... Just run a netstat for a busy TCP based server (Apache) to see bunch of sockets in a TIME_WAIT state.

trikke
12-11-2007, 14:24
Which maybe prooves that some people are not able to write "decent" Applications ;) ( reusing/opening Ports, binding ports, closeing ports ... )

bbrendon
12-11-2007, 18:35
My zabbix server has between 250 to 500 connections in TIME_WAIT.

Clients have about 0 to 15 connections in TIME_WAIT.

...though I'm not sure this is a bad thing. I have started monitoring this TIME_WAIT issue though because I'm trying to solve a problem where agents seem to go to sleep for about 15 minutes causing false positives and large amounts of panic :)