Originally posted by drose12
Ad Widget
Collapse
How to tell if Zabbix has memory leak?
Collapse
X
-
I've created a crontab job to restart it ounce a day for now. Not the right way but a quick fix for now until I can find out what is going on.
Comment
-
Yea, not what I want to do...I'm hoping to convince the powers to be (developers?) that there is a problem.Originally posted by PalmertreeI've created a crontab job to restart it ounce a day for now. Not the right way but a quick fix for now until I can find out what is going on.
Comment
-
I've been messing around with "mtrace" to try and determine if there is a free() not associated with a malloc() in the code. Still digging around to nail down my memory leak. I wonder if it's curl or jabber library. I might try and compile without them to see if memory still leaks.
Comment
-
Originally posted by PalmertreeI've been messing around with "mtrace" to try and determine if there is a free() not associated with a malloc() in the code. Still digging around to nail down my memory leak. I wonder if it's curl or jabber library. I might try and compile without them to see if memory still leaks.
Hmm, interesting...I am using curl with 1.4.1 and some simple web checks that I didn't use before with 1.1.Comment
-
I am just curious if it is something in my setup but I am having to restart zabbix_server every 6 hours due to a major memory leak. After 6 hours the process according to "top" is using 32M per process. Is anyone else seeing this? I've created a crontab to restart the service every 6 hours until I can figure out what's going on with my system. It might be my setup not sure yet.
I am currently running Pre 1.4.3.
Last edited by Palmertree; 05-09-2007, 23:16.Comment
-
-
I am not aware of any memory leaks in the latest code of ZABBIX server.
ZABBIX 1.4.2 had a memory leak in case if ZABBIX server received values for non-existant items sent by zabbix_sender. Perhaps this is what you have?Comment
-
Thanks Alexei for the quick reply.
I'm running the newest code, Pre 1.4.3, in the developers link. It seems to increase faster the more Web checks I add. I will do some experimentation by turning off Zabbix_Senders and watch it for a while and will do the same test with the web checks to see if my memory for the zabbix_server process grows. Thanks for help and keep up the good work, Zabbix is a great tool. :-)Comment
-
When server starts, it dump PIDs of all individual processes along with descriptions into log file. Write them down and check what process consumes the memory resources.Comment
-
-
Thanks for that info. It appears to be in the SNMP pollers.
30936:20070906:160525 Starting zabbix_server. ZABBIX 1.4.3.
30936:20070906:160525 **** Enabled features ****
30936:20070906:160525 SNMP monitoring: YES
30936:20070906:160525 WEB monitoring: YES
30936:20070906:160525 Jabber notifications: YES
30936:20070906:160525 **************************
30947:20070906:160525 server #11 started [Trapper]
30948:20070906:160525 server #12 started [Trapper]
30949:20070906:160525 server #13 started [Trapper]
30950:20070906:160525 server #14 started [Trapper]
30951:20070906:160525 server #15 started [Trapper]
30952:20070906:160525 server #16 started [ICMP pinger]
30953:20070906:160525 server #17 started [ICMP pinger]
30954:20070906:160525 server #18 started [ICMP pinger]
30955:20070906:160525 server #19 started [Alerter]
30956:20070906:160525 server #20 started [Housekeeper]
30956:20070906:160525 Executing housekeeper
30959:20070906:160525 server #21 started [Timer]
30965:20070906:160525 server #27 started [Node watcher. Node ID:0]
30966:20070906:160525 server #28 started [HTTP Poller]
30967:20070906:160525 server #29 started [HTTP Poller]
30936:20070906:160525 server #0 started [Watchdog]
30942:20070906:160526 server #6 started [Poller. SNMP:ON]
30937:20070906:160526 server #1 started [Poller. SNMP:ON]
30943:20070906:160526 server #7 started [Poller. SNMP:ON]
30938:20070906:160526 server #2 started [Poller. SNMP:ON]
30939:20070906:160526 server #3 started [Poller. SNMP:ON]
30960:20070906:160526 server #22 started [Poller for unreachable hosts. SNMP:ON]
30945:20070906:160526 server #9 started [Poller. SNMP:ON]
30940:20070906:160526 server #4 started [Poller. SNMP:ON]
30968:20070906:160526 server #30 started [Discoverer. SNMP:ON]
30962:20070906:160526 server #24 started [Poller for unreachable hosts. SNMP:ON]
30944:20070906:160526 server #8 started [Poller. SNMP:ON]
30946:20070906:160527 server #10 started [Poller. SNMP:ON]
30941:20070906:160527 server #5 started [Poller. SNMP:ON]
30964:20070906:160527 server #26 started [Poller for unreachable hosts. SNMP:ON]
30961:20070906:160527 server #23 started [Poller for unreachable hosts. SNMP:ON]
30963:20070906:160527 server #25 started [Poller for unreachable hosts. SNMP:ON]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30936 zabbix 25 5 58768 1640 1100 S 0 0.0 0:00.06 zabbix_server
30937 zabbix 25 5 65012 8576 1600 S 2 0.1 1:14.36 zabbix_server
30938 zabbix 25 5 64960 8604 1588 S 2 0.1 1:19.18 zabbix_server
30939 zabbix 25 5 65152 8772 1592 S 3 0.1 1:21.55 zabbix_server
30940 zabbix 25 5 65076 8656 1592 S 4 0.1 1:19.13 zabbix_server
30941 zabbix 25 5 65484 9004 1588 S 3 0.1 1:20.76 zabbix_server
30942 zabbix 25 5 65256 8796 1584 S 2 0.1 1:15.16 zabbix_server
30943 zabbix 25 5 65196 8804 1596 S 2 0.1 1:21.16 zabbix_server
30944 zabbix 25 5 65120 8756 1604 S 3 0.1 1:20.96 zabbix_server
30945 zabbix 25 5 65184 8844 1592 S 4 0.1 1:20.90 zabbix_server
30946 zabbix 25 5 65096 8736 1596 S 2 0.1 1:18.29 zabbix_server
30947 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.15 zabbix_server
30948 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.15 zabbix_server
30949 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.15 zabbix_server
30950 zabbix 25 5 58768 1344 752 S 0 0.0 0:00.15 zabbix_server
30951 zabbix 25 5 58768 1344 752 S 0 0.0 0:00.15 zabbix_server
30952 zabbix 25 5 58768 1320 760 S 0 0.0 0:01.36 zabbix_server
30953 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.36 zabbix_server
30954 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.35 zabbix_server
30955 zabbix 25 5 58768 1076 536 S 0 0.0 0:00.02 zabbix_server
30956 zabbix 25 5 58768 1200 604 S 0 0.0 0:12.61 zabbix_server
30959 zabbix 25 5 58768 1220 672 S 0 0.0 0:00.67 zabbix_server
30960 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.14 zabbix_server
30961 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
30962 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
30963 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
30964 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.15 zabbix_server
30965 zabbix 25 5 58768 1088 548 S 0 0.0 0:00.10 zabbix_server
30966 zabbix 25 5 62756 4264 2256 S 0 0.1 0:19.03 zabbix_server
30967 zabbix 25 5 62140 3564 2240 S 0 0.0 0:07.52 zabbix_server
30968 zabbix 25 5 59288 2584 1268 S 0 0.0 0:00.08 zabbix_serverLast edited by Palmertree; 06-09-2007, 23:29.Comment
-
Here another snapshot of the memory 30 minutes later. You can see the SNMP pollers grew by at least 1000K withing just 30 minutes.
It's the Resident Memory Size that is increasing.
RES -- Resident size (kb)
The non-swapped physical memory a task has used.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30936 zabbix 25 5 58768 1640 1100 S 0 0.0 0:00.06 zabbix_server
30937 zabbix 25 5 66464 9.8m 1600 S 3 0.1 1:42.34 zabbix_server
30938 zabbix 25 5 66836 10m 1588 S 0 0.1 1:49.01 zabbix_server
30939 zabbix 25 5 67172 10m 1592 S 3 0.1 1:52.63 zabbix_server
30940 zabbix 25 5 66964 10m 1592 S 3 0.1 1:48.89 zabbix_server
30941 zabbix 25 5 67212 10m 1588 S 3 0.1 1:51.98 zabbix_server
30942 zabbix 25 5 66716 10m 1584 S 2 0.1 1:43.73 zabbix_server
30943 zabbix 25 5 67160 10m 1596 S 2 0.1 1:52.40 zabbix_server
30944 zabbix 25 5 67112 10m 1604 S 1 0.1 1:51.72 zabbix_server
30945 zabbix 25 5 67260 10m 1592 S 3 0.1 1:51.96 zabbix_server
30946 zabbix 25 5 66992 10m 1596 S 2 0.1 1:49.05 zabbix_server
30947 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.21 zabbix_server
30948 zabbix 25 5 58768 1336 744 S 0 0.0 0:00.21 zabbix_server
30949 zabbix 25 5 58768 1420 776 S 0 0.0 0:00.21 zabbix_server
30950 zabbix 25 5 58768 1412 776 S 0 0.0 0:00.22 zabbix_server
30951 zabbix 25 5 58768 1344 752 S 0 0.0 0:00.20 zabbix_server
30952 zabbix 25 5 58768 1320 760 S 0 0.0 0:01.84 zabbix_server
30953 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.85 zabbix_server
30954 zabbix 25 5 58768 1328 768 S 0 0.0 0:01.86 zabbix_server
30955 zabbix 25 5 60836 1592 1028 S 0 0.0 0:00.07 zabbix_server
30956 zabbix 25 5 58768 1200 604 S 0 0.0 0:25.07 zabbix_server
30959 zabbix 25 5 58768 1220 672 S 0 0.0 0:00.92 zabbix_server
30960 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.17 zabbix_server
30961 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.18 zabbix_server
30962 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.17 zabbix_server
30963 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.18 zabbix_server
30964 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.18 zabbix_server
30965 zabbix 25 5 58768 1088 548 S 0 0.0 0:00.14 zabbix_server
30966 zabbix 25 5 62144 3644 2256 S 4 0.0 0:26.37 zabbix_server
30967 zabbix 25 5 62140 3564 2240 S 1 0.0 0:10.43 zabbix_server
30968 zabbix 25 5 59288 2584 1268 S 0 0.0 0:00.09 zabbix_serverLast edited by Palmertree; 06-09-2007, 23:35.Comment
-
30 minutes more the snmp pollers have increased the RES memory from 10m to 14m.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30936 zabbix 25 5 58768 1640 1100 S 0 0.0 0:00.07 zabbix_server
30937 zabbix 25 5 70204 13m 1600 S 2 0.2 2:38.12 zabbix_server
30938 zabbix 25 5 70700 14m 1588 S 0 0.2 2:47.34 zabbix_server
30939 zabbix 25 5 71208 14m 1592 S 4 0.2 2:53.12 zabbix_server
30940 zabbix 25 5 70864 14m 1604 S 2 0.2 2:48.05 zabbix_server
30941 zabbix 25 5 71004 14m 1588 S 2 0.2 2:51.85 zabbix_server
30942 zabbix 25 5 70616 13m 1584 S 4 0.2 2:41.11 zabbix_server
30943 zabbix 25 5 70884 14m 1596 S 2 0.2 2:52.81 zabbix_server
30944 zabbix 25 5 71008 14m 1604 S 2 0.2 2:51.66 zabbix_server
30945 zabbix 25 5 71084 14m 1604 S 4 0.2 2:51.22 zabbix_server
30946 zabbix 25 5 70744 13m 1596 S 4 0.2 2:48.66 zabbix_server
30947 zabbix 25 5 58768 1360 760 S 0 0.0 0:00.32 zabbix_server
30948 zabbix 25 5 58768 1360 760 S 0 0.0 0:00.33 zabbix_server
30949 zabbix 25 5 58768 1420 776 S 0 0.0 0:00.34 zabbix_server
30950 zabbix 25 5 58768 1412 776 S 0 0.0 0:00.33 zabbix_server
30951 zabbix 25 5 58768 1360 760 S 0 0.0 0:00.32 zabbix_server
30952 zabbix 25 5 58768 1320 760 S 0 0.0 0:02.85 zabbix_server
30953 zabbix 25 5 58768 1328 768 S 0 0.0 0:02.85 zabbix_server
30954 zabbix 25 5 58768 1328 768 S 0 0.0 0:02.85 zabbix_server
30955 zabbix 25 5 60836 1592 1028 S 0 0.0 0:00.12 zabbix_server
30956 zabbix 25 5 58768 1200 604 S 0 0.0 0:25.07 zabbix_server
30959 zabbix 25 5 58768 1220 672 S 0 0.0 0:01.40 zabbix_server
30960 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.22 zabbix_server
30961 zabbix 25 5 59384 2752 1400 S 0 0.0 0:00.23 zabbix_server
30962 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.22 zabbix_server
30963 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.23 zabbix_server
30964 zabbix 25 5 59288 2596 1280 S 0 0.0 0:00.23 zabbix_server
30965 zabbix 25 5 58768 1088 548 S 0 0.0 0:00.22 zabbix_server
30966 zabbix 25 5 62144 3648 2256 S 0 0.0 0:40.14 zabbix_server
30967 zabbix 25 5 62140 3564 2240 S 0 0.0 0:16.05 zabbix_server
30968 zabbix 25 5 59288 2584 1268 S 0 0.0 0:00.09 zabbix_serverComment
-
Solution Found
I have found why my SNMP pollers were leaking memory. I wanted to share this with everyone because this might be a problem for a lot of people and it took me a few days to figure this out.
The problem is not in the Zabbix code but the NET-SNMP libraries. There is a major memory leak in the libraries for "NET-SNMP Version 5.4". I was able to determine this by running "Valgrind" and saw the "varbinds" leaking like crazy. NET_SNMP version 5.4 is used in most yum installs. Net-SNMP Version 5.4.1 fixes these memory leaks. I had to down load the newest version and compile it. After installing version 5.4.1 of NET-SNMP you must recompile Zabbix Pre 1.4.3 (do not use 1.4.2, has a memory leak in the trappers).
NET-SNMP Release Notes
Release Name: 5.4.1
Notes:
*5.4.1*
snmplib:
- [BUG 1619827]: link libraries against needed external libraries
- [PATCH 1616912]: fix memory leak in UDP transport code
- [PATCH 1592706]: fix memory leak when cloning varbindsLast edited by Palmertree; 10-09-2007, 07:53.Comment
Comment