Ad Widget

Collapse

How to tell if Zabbix has memory leak?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • drose12
    Junior Member
    • Apr 2007
    • 27

    #16
    Originally posted by drose12
    I've created mine as well...I'll post my results after a day or so as I've just restarted zabbix to show the trend.
    Ok, here are my results, and they don't look promising...as you can see the memory consumption of my zabbix server daemons is pretty constant and doomed to consume all my memory.
    Attached Files

    Comment

    • Palmertree
      Senior Member
      • Sep 2005
      • 746

      #17
      I've created a crontab job to restart it ounce a day for now. Not the right way but a quick fix for now until I can find out what is going on.

      Comment

      • drose12
        Junior Member
        • Apr 2007
        • 27

        #18
        Originally posted by Palmertree
        I've created a crontab job to restart it ounce a day for now. Not the right way but a quick fix for now until I can find out what is going on.
        Yea, not what I want to do...I'm hoping to convince the powers to be (developers?) that there is a problem.

        Comment

        • Palmertree
          Senior Member
          • Sep 2005
          • 746

          #19
          I've been messing around with "mtrace" to try and determine if there is a free() not associated with a malloc() in the code. Still digging around to nail down my memory leak. I wonder if it's curl or jabber library. I might try and compile without them to see if memory still leaks.

          Comment

          • drose12
            Junior Member
            • Apr 2007
            • 27

            #20
            Originally posted by Palmertree
            I've been messing around with "mtrace" to try and determine if there is a free() not associated with a malloc() in the code. Still digging around to nail down my memory leak. I wonder if it's curl or jabber library. I might try and compile without them to see if memory still leaks.

            Hmm, interesting...I am using curl with 1.4.1 and some simple web checks that I didn't use before with 1.1.

            Comment

            • Palmertree
              Senior Member
              • Sep 2005
              • 746

              #21
              I am just curious if it is something in my setup but I am having to restart zabbix_server every 6 hours due to a major memory leak. After 6 hours the process according to "top" is using 32M per process. Is anyone else seeing this? I've created a crontab to restart the service every 6 hours until I can figure out what's going on with my system. It might be my setup not sure yet. I am currently running Pre 1.4.3.
              Last edited by Palmertree; 05-09-2007, 23:16.

              Comment

              • Palmertree
                Senior Member
                • Sep 2005
                • 746

                #22
                Is anyone else seeing memory leaks with Pre 1.4.3 or is just me?

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #23
                  I am not aware of any memory leaks in the latest code of ZABBIX server.

                  ZABBIX 1.4.2 had a memory leak in case if ZABBIX server received values for non-existant items sent by zabbix_sender. Perhaps this is what you have?
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • Palmertree
                    Senior Member
                    • Sep 2005
                    • 746

                    #24
                    Thanks Alexei for the quick reply.

                    I'm running the newest code, Pre 1.4.3, in the developers link. It seems to increase faster the more Web checks I add. I will do some experimentation by turning off Zabbix_Senders and watch it for a while and will do the same test with the web checks to see if my memory for the zabbix_server process grows. Thanks for help and keep up the good work, Zabbix is a great tool. :-)

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #25
                      When server starts, it dump PIDs of all individual processes along with descriptions into log file. Write them down and check what process consumes the memory resources.
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      • Palmertree
                        Senior Member
                        • Sep 2005
                        • 746

                        #26
                        Ok. Will do. I will let you know what I find out.

                        Comment

                        • Palmertree
                          Senior Member
                          • Sep 2005
                          • 746

                          #27
                          Thanks for that info. It appears to be in the SNMP pollers.

                          30936:20070906:160525 Starting zabbix_server. ZABBIX 1.4.3.
                          30936:20070906:160525 **** Enabled features ****
                          30936:20070906:160525 SNMP monitoring: YES
                          30936:20070906:160525 WEB monitoring: YES
                          30936:20070906:160525 Jabber notifications: YES
                          30936:20070906:160525 **************************
                          30947:20070906:160525 server #11 started [Trapper]
                          30948:20070906:160525 server #12 started [Trapper]
                          30949:20070906:160525 server #13 started [Trapper]
                          30950:20070906:160525 server #14 started [Trapper]
                          30951:20070906:160525 server #15 started [Trapper]
                          30952:20070906:160525 server #16 started [ICMP pinger]
                          30953:20070906:160525 server #17 started [ICMP pinger]
                          30954:20070906:160525 server #18 started [ICMP pinger]
                          30955:20070906:160525 server #19 started [Alerter]
                          30956:20070906:160525 server #20 started [Housekeeper]
                          30956:20070906:160525 Executing housekeeper
                          30959:20070906:160525 server #21 started [Timer]
                          30965:20070906:160525 server #27 started [Node watcher. Node ID:0]
                          30966:20070906:160525 server #28 started [HTTP Poller]
                          30967:20070906:160525 server #29 started [HTTP Poller]
                          30936:20070906:160525 server #0 started [Watchdog]
                          30942:20070906:160526 server #6 started [Poller. SNMP:ON]
                          30937:20070906:160526 server #1 started [Poller. SNMP:ON]
                          30943:20070906:160526 server #7 started [Poller. SNMP:ON]
                          30938:20070906:160526 server #2 started [Poller. SNMP:ON]
                          30939:20070906:160526 server #3 started [Poller. SNMP:ON]
                          30960:20070906:160526 server #22 started [Poller for unreachable hosts. SNMP:ON]
                          30945:20070906:160526 server #9 started [Poller. SNMP:ON]
                          30940:20070906:160526 server #4 started [Poller. SNMP:ON]
                          30968:20070906:160526 server #30 started [Discoverer. SNMP:ON]
                          30962:20070906:160526 server #24 started [Poller for unreachable hosts. SNMP:ON]
                          30944:20070906:160526 server #8 started [Poller. SNMP:ON]
                          30946:20070906:160527 server #10 started [Poller. SNMP:ON]
                          30941:20070906:160527 server #5 started [Poller. SNMP:ON]
                          30964:20070906:160527 server #26 started [Poller for unreachable hosts. SNMP:ON]
                          30961:20070906:160527 server #23 started [Poller for unreachable hosts. SNMP:ON]
                          30963:20070906:160527 server #25 started [Poller for unreachable hosts. SNMP:ON]




                          PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                          30936 zabbix 25 5 58768 1640 1100 S 0 0.0 0:00.06 zabbix_server
                          30937 zabbix 25 5 65012 8576 1600 S 2 0.1 1:14.36 zabbix_server
                          30938 zabbix 25 5 64960 8604 1588 S 2 0.1 1:19.18 zabbix_server
                          30939 zabbix 25 5 65152 8772 1592 S 3 0.1 1:21.55 zabbix_server
                          30940 zabbix 25 5 65076 8656 1592 S 4 0.1 1:19.13 zabbix_server
                          30941 zabbix 25 5 65484 9004 1588 S 3 0.1 1:20.76 zabbix_server
                          30942 zabbix 25 5 65256 8796 1584 S 2 0.1 1:15.16 zabbix_server
                          30943 zabbix 25 5 65196 8804 1596 S 2 0.1 1:21.16 zabbix_server
                          30944 zabbix 25 5 65120 8756 1604 S 3 0.1 1:20.96 zabbix_server
                          30945 zabbix 25 5 65184 8844 1592 S 4 0.1 1:20.90 zabbix_server
                          30946 zabbix 25 5 65096 8736 1596 S 2 0.1 1:18.29 zabbix_server

                          30947 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.15 zabbix_server
                          30948 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.15 zabbix_server
                          30949 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.15 zabbix_server
                          30950 zabbix 25 5 58768 1344 752 S 0 0.0 0:00.15 zabbix_server
                          30951 zabbix 25 5 58768 1344 752 S 0 0.0 0:00.15 zabbix_server
                          30952 zabbix 25 5 58768 1320 760 S 0 0.0 0:01.36 zabbix_server
                          30953 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.36 zabbix_server
                          30954 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.35 zabbix_server
                          30955 zabbix 25 5 58768 1076 536 S 0 0.0 0:00.02 zabbix_server
                          30956 zabbix 25 5 58768 1200 604 S 0 0.0 0:12.61 zabbix_server
                          30959 zabbix 25 5 58768 1220 672 S 0 0.0 0:00.67 zabbix_server
                          30960 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.14 zabbix_server
                          30961 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
                          30962 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
                          30963 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
                          30964 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
                          30965 zabbix 25 5 58768 1088 548 S 0 0.0 0:00.10 zabbix_server
                          30966 zabbix 25 5 62756 4264 2256 S 0 0.1 0:19.03 zabbix_server
                          30967 zabbix 25 5 62140 3564 2240 S 0 0.0 0:07.52 zabbix_server
                          30968 zabbix 25 5 59288 2584 1268 S 0 0.0 0:00.08 zabbix_server
                          Last edited by Palmertree; 06-09-2007, 23:29.

                          Comment

                          • Palmertree
                            Senior Member
                            • Sep 2005
                            • 746

                            #28
                            Here another snapshot of the memory 30 minutes later. You can see the SNMP pollers grew by at least 1000K withing just 30 minutes.

                            It's the Resident Memory Size that is increasing.

                            RES -- Resident size (kb)
                            The non-swapped physical memory a task has used.

                            PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                            30936 zabbix 25 5 58768 1640 1100 S 0 0.0 0:00.06 zabbix_server
                            30937 zabbix 25 5 66464 9.8m 1600 S 3 0.1 1:42.34 zabbix_server
                            30938 zabbix 25 5 66836 10m 1588 S 0 0.1 1:49.01 zabbix_server
                            30939 zabbix 25 5 67172 10m 1592 S 3 0.1 1:52.63 zabbix_server
                            30940 zabbix 25 5 66964 10m 1592 S 3 0.1 1:48.89 zabbix_server
                            30941 zabbix 25 5 67212 10m 1588 S 3 0.1 1:51.98 zabbix_server
                            30942 zabbix 25 5 66716 10m 1584 S 2 0.1 1:43.73 zabbix_server
                            30943 zabbix 25 5 67160 10m 1596 S 2 0.1 1:52.40 zabbix_server
                            30944 zabbix 25 5 67112 10m 1604 S 1 0.1 1:51.72 zabbix_server
                            30945 zabbix 25 5 67260 10m 1592 S 3 0.1 1:51.96 zabbix_server
                            30946 zabbix 25 5 66992 10m 1596 S 2 0.1 1:49.05 zabbix_server

                            30947 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.21 zabbix_server
                            30948 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.21 zabbix_server
                            30949 zabbix 25 5 58768 1420 776 S 0 0.0 0:00.21 zabbix_server
                            30950 zabbix 25 5 58768 1412 776 S 0 0.0 0:00.22 zabbix_server
                            30951 zabbix 25 5 58768 1344 752 S 0 0.0 0:00.20 zabbix_server
                            30952 zabbix 25 5 58768 1320 760 S 0 0.0 0:01.84 zabbix_server
                            30953 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.85 zabbix_server
                            30954 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.86 zabbix_server
                            30955 zabbix 25 5 60836 1592 1028 S 0 0.0 0:00.07 zabbix_server
                            30956 zabbix 25 5 58768 1200 604 S 0 0.0 0:25.07 zabbix_server
                            30959 zabbix 25 5 58768 1220 672 S 0 0.0 0:00.92 zabbix_server
                            30960 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.17 zabbix_server
                            30961 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.18 zabbix_server
                            30962 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.17 zabbix_server
                            30963 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.18 zabbix_server
                            30964 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.18 zabbix_server
                            30965 zabbix 25 5 58768 1088 548 S 0 0.0 0:00.14 zabbix_server
                            30966 zabbix 25 5 62144 3644 2256 S 4 0.0 0:26.37 zabbix_server
                            30967 zabbix 25 5 62140 3564 2240 S 1 0.0 0:10.43 zabbix_server
                            30968 zabbix 25 5 59288 2584 1268 S 0 0.0 0:00.09 zabbix_server
                            Last edited by Palmertree; 06-09-2007, 23:35.

                            Comment

                            • Palmertree
                              Senior Member
                              • Sep 2005
                              • 746

                              #29
                              30 minutes more the snmp pollers have increased the RES memory from 10m to 14m.

                              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                              30936 zabbix 25 5 58768 1640 1100 S 0 0.0 0:00.07 zabbix_server
                              30937 zabbix 25 5 70204 13m 1600 S 2 0.2 2:38.12 zabbix_server
                              30938 zabbix 25 5 70700 14m 1588 S 0 0.2 2:47.34 zabbix_server
                              30939 zabbix 25 5 71208 14m 1592 S 4 0.2 2:53.12 zabbix_server
                              30940 zabbix 25 5 70864 14m 1604 S 2 0.2 2:48.05 zabbix_server
                              30941 zabbix 25 5 71004 14m 1588 S 2 0.2 2:51.85 zabbix_server
                              30942 zabbix 25 5 70616 13m 1584 S 4 0.2 2:41.11 zabbix_server
                              30943 zabbix 25 5 70884 14m 1596 S 2 0.2 2:52.81 zabbix_server
                              30944 zabbix 25 5 71008 14m 1604 S 2 0.2 2:51.66 zabbix_server
                              30945 zabbix 25 5 71084 14m 1604 S 4 0.2 2:51.22 zabbix_server
                              30946 zabbix 25 5 70744 13m 1596 S 4 0.2 2:48.66 zabbix_server

                              30947 zabbix 25 5 58768 1360 760 S 0 0.0 0:00.32 zabbix_server
                              30948 zabbix 25 5 58768 1360 760 S 0 0.0 0:00.33 zabbix_server
                              30949 zabbix 25 5 58768 1420 776 S 0 0.0 0:00.34 zabbix_server
                              30950 zabbix 25 5 58768 1412 776 S 0 0.0 0:00.33 zabbix_server
                              30951 zabbix 25 5 58768 1360 760 S 0 0.0 0:00.32 zabbix_server
                              30952 zabbix 25 5 58768 1320 760 S 0 0.0 0:02.85 zabbix_server
                              30953 zabbix 25 5 58768 1328 768 S 0 0.0 0:02.85 zabbix_server
                              30954 zabbix 25 5 58768 1328 768 S 0 0.0 0:02.85 zabbix_server
                              30955 zabbix 25 5 60836 1592 1028 S 0 0.0 0:00.12 zabbix_server
                              30956 zabbix 25 5 58768 1200 604 S 0 0.0 0:25.07 zabbix_server
                              30959 zabbix 25 5 58768 1220 672 S 0 0.0 0:01.40 zabbix_server
                              30960 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.22 zabbix_server
                              30961 zabbix 25 5 59384 2752 1400 S 0 0.0 0:00.23 zabbix_server
                              30962 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.22 zabbix_server
                              30963 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.23 zabbix_server
                              30964 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.23 zabbix_server
                              30965 zabbix 25 5 58768 1088 548 S 0 0.0 0:00.22 zabbix_server
                              30966 zabbix 25 5 62144 3648 2256 S 0 0.0 0:40.14 zabbix_server
                              30967 zabbix 25 5 62140 3564 2240 S 0 0.0 0:16.05 zabbix_server
                              30968 zabbix 25 5 59288 2584 1268 S 0 0.0 0:00.09 zabbix_server

                              Comment

                              • Palmertree
                                Senior Member
                                • Sep 2005
                                • 746

                                #30
                                Solution Found

                                I have found why my SNMP pollers were leaking memory. I wanted to share this with everyone because this might be a problem for a lot of people and it took me a few days to figure this out. The problem is not in the Zabbix code but the NET-SNMP libraries. There is a major memory leak in the libraries for "NET-SNMP Version 5.4". I was able to determine this by running "Valgrind" and saw the "varbinds" leaking like crazy. NET_SNMP version 5.4 is used in most yum installs. Net-SNMP Version 5.4.1 fixes these memory leaks. I had to down load the newest version and compile it. After installing version 5.4.1 of NET-SNMP you must recompile Zabbix Pre 1.4.3 (do not use 1.4.2, has a memory leak in the trappers).

                                NET-SNMP Release Notes
                                Release Name: 5.4.1
                                Notes:
                                *5.4.1*

                                snmplib:
                                - [BUG 1619827]: link libraries against needed external libraries
                                - [PATCH 1616912]: fix memory leak in UDP transport code
                                - [PATCH 1592706]: fix memory leak when cloning varbinds
                                Last edited by Palmertree; 10-09-2007, 07:53.

                                Comment

                                Working...