Ad Widget

Collapse

Closing listening socket in zabbix-agentd 1.1

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • golish
    Junior Member
    • Jul 2006
    • 6

    #1

    Closing listening socket in zabbix-agentd 1.1

    Hi.

    While using Zabbix on several hosts, I've found a very annoing bug (?) in zabbix-agentd: it doesn't close the socket which it opens to listen for connections from the server. This means that after stopping the zabbix-agentd process I had to wait for the socket to timeout and get out of the TIME_WAIT state before I could start the agent again. Attached is a patch (which I would classify as a dirty-hack but it actually works) which fixes the problem.

    Regards,
    Marcin Goliszewski.
    Attached Files
  • just2blue4u
    Senior Member
    • Apr 2006
    • 347

    #2
    Thanks for your work! I have the same problem (sockets in "time_wait" state).

    Where did you test that patch?
    Big ZABBIX is watching you!
    (... and my 48 hosts, 4513 items, 1280 triggers via zabbix v1.6 on CentOS 5.0)

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Honestly I do not understand how this patch may help as normal exit should close all file/socket descriptors, so the patch actually does nothing. Please correct me if I'm wrong.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • golish
        Junior Member
        • Jul 2006
        • 6

        #4
        Originally posted by Alexei
        Honestly I do not understand how this patch may help as normal exit should close all file/socket descriptors, so the patch actually does nothing. Please correct me if I'm wrong.
        You do close nearly all socket descriptors. All but one, to be exact. In src/zabbix_agent/zabbix_agentd.c, in function main() you open a listening socket (using tcp_listen()) and assign it to listenfd. I don't see any close(listenfd); or anything similar in your code and this patches tries to fix this bug.

        Comment

        • golish
          Junior Member
          • Jul 2006
          • 6

          #5
          Originally posted by just2blue4u
          Where did you test that patch?
          On 10 machines running Gentoo Linux and Zabbix 1.1.

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            Originally posted by golish
            You do close nearly all socket descriptors. All but one, to be exact. In src/zabbix_agent/zabbix_agentd.c, in function main() you open a listening socket (using tcp_listen()) and assign it to listenfd. I don't see any close(listenfd); or anything similar in your code and this patches tries to fix this bug.
            Yes, I know that the socket is not closed anywhere in the code. It is because of (possibly wrong) assumption that OS (linux) closes all open file descriptors on program shutdown. Am I right?
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • golish
              Junior Member
              • Jul 2006
              • 6

              #7
              Originally posted by Alexei
              Yes, I know that the socket is not closed anywhere in the code. It is because of (possibly wrong) assumption that OS (linux) closes all open file descriptors on program shutdown. Am I right?
              Indeed, Linux closes all fd's on program termination but I'm not sure if it does close the sockets themselves in the same manner as calling close() does. Empirical testing showed that after close()ing the listenfd socket is not kept in the TIME_WAIT state for as long as it was before which lead me to the conclusion that there may be some important difference.

              Comment

              • abi
                Member
                • Jun 2006
                • 81

                #8
                hi,

                Originally posted by golish
                Indeed, Linux closes all fd's on program termination but I'm not sure if it does close the sockets themselves in the same manner as calling close() does. Empirical testing showed that after close()ing the listenfd socket is not kept in the TIME_WAIT state for as long as it was before which lead me to the conclusion that there may be some important difference.
                yes, im not sure about that too. TIME_WAIT is perfectly okey as
                TCP guarantees all data transmitted will be delivered and so the
                socket goes into TIME_WAIT until that happened.

                I think simply killing the childs and leaving the sockets open
                may result in a longer TIME_WAIT state, though. At least it
                seems like that.

                Pointer: http://www.manualy.sk/sock-faq/unix-...q-2.html#close
                Last edited by abi; 04-07-2006, 16:38.

                Comment

                • golish
                  Junior Member
                  • Jul 2006
                  • 6

                  #9
                  Originally posted by abi
                  yes, im not sure about that too. TIME_WAIT is perfectly okey as TCP guarantees all data transmitted will be delivered and so the socket goes into TIME_WAIT until that happened.
                  Yes, I know that. But what kind of data would you expect to be left to be transmitted on a listening socket?

                  Originally posted by abi
                  Yes, I'm familiar with this FAQ.

                  Comment

                  • abi
                    Member
                    • Jun 2006
                    • 81

                    #10
                    Originally posted by golish
                    Yes, I know that. But what kind of data would you expect to be left to be transmitted on a listening socket?
                    "When you close a socket, the server goes into a TIME_WAIT state, just to be really really sure that all the data has gone through. When a socket is closed, both sides agree by sending messages to each other that they will send no more data."

                    what my guess is, that killing the childs which own open sockets
                    results in the last messages get somehow lost and the TIME_WAIT
                    sockets are beeing closed by the kernel after the timout (which
                    is 60 seconds according to /proc/sys/net/ipv4/tcp_fin_timeout)

                    Comment

                    • Clansman
                      Junior Member
                      • May 2006
                      • 28

                      #11
                      even if the OS closes sockets and fds upon process termination, it's good practise and improves portability if the programmer does so explicitly.

                      same thing with memory allocation - the OS clears it when the process terminates. however, nobody with more than 1h of C experience will malloc() without free() just because the OS does it then the program exits...

                      []

                      Comment

                      • paul
                        Junior Member
                        • Sep 2006
                        • 1

                        #12
                        please stop discussing and fix the bug

                        i'm using zabbix for monitoring windows and linux servers (5 x linux, 20 x windows) across an openbsd firewall. the firewall shows about 4000 links in fin_wait mode. using the agressive mode of the pf firewall for handling such connections reduces the amount of waiting connections to 2000. if i stop the zabbix server the amount of connections is about 600 to 900. so i think it's clear that there is a bug in the agent (windows and linux).

                        Comment

                        • Alexei
                          Founder, CEO
                          Zabbix Certified Trainer
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Sep 2004
                          • 5654

                          #13
                          Originally posted by paul
                          so i think it's clear that there is a bug in the agent (windows and linux).
                          Pardon, what bug?
                          Alexei Vladishev
                          Creator of Zabbix, Product manager
                          New York | Tokyo | Riga
                          My Twitter

                          Comment

                          • nullpt
                            Junior Member
                            • Mar 2007
                            • 12

                            #14
                            I have several solaris boxes running zabbix_agentd and they all have 300+ TIME_WAIT state sockets. Is there any workaround for this without messing with PF or the operating system tcp timeouts tunables?

                            Comment

                            Working...