Ad Widget

Collapse

zabbix server Memory Fill Up

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • hughmcl
    Junior Member
    • Oct 2008
    • 20

    #16
    Here's a copy of top:-

    top - 10:26:09 up 2 days, 12:08, 3 users, load average: 0.78, 0.75, 0.67
    Tasks: 168 total, 1 running, 167 sleeping, 0 stopped, 0 zombie
    Cpu(s): 8.0%us, 3.6%sy, 4.6%ni, 81.1%id, 2.1%wa, 0.1%hi, 0.5%si, 0.0%st
    Mem: 3369668k total, 3240296k used, 129372k free, 128168k buffers
    Swap: 6225896k total, 506380k used, 5719516k free, 1357788k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    3836 mysql 15 0 799m 350m 3332 S 42 10.6 1416:06 mysqld
    17566 zabbix 20 5 9668 3648 2188 S 17 0.1 6:00.46 zabbix_server
    17544 zabbix 20 5 351m 247m 1588 S 2 7.5 0:55.94 zabbix_server
    17545 zabbix 20 5 336m 237m 1596 S 2 7.2 1:04.32 zabbix_server
    17543 zabbix 20 5 337m 237m 1592 S 1 7.2 0:55.46 zabbix_server
    17547 zabbix 20 5 341m 240m 1600 S 1 7.3 1:02.82 zabbix_server
    17550 zabbix 28 5 8836 1524 876 S 1 0.0 0:10.57 zabbix_server

    The processes are now taking over 237MB each.

    Is there anything else we can get you copies of to help?

    Comment

    • snivek
      Junior Member
      • Jan 2009
      • 6

      #17
      I too am experiencing what seems to be a memory leak. I am running Ubuntu (in a VM) that has 1GB of memory associated with it. This is a 1.6.2 install compiled form source.

      Here is a snap of TOP and I also attached a graph of the server free memory ( the increase you see was when I reduced the number of apache servers and restarted apache).

      top - 13:47:21 up 2:56, 2 users, load average: 0.35, 0.29, 0.42
      Tasks: 92 total, 4 running, 88 sleeping, 0 stopped, 0 zombie
      Cpu(s): 18.3%us, 8.3%sy, 0.3%ni, 72.4%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
      Mem: 1035232k total, 893296k used, 141936k free, 115740k buffers
      Swap: 409616k total, 0k used, 409616k free, 255744k cached

      PID USER PR NI VIRT RES SHR S %MEM %CPU TIME+ COMMAND
      4600 zabbix 25 5 116m 78m 1720 S 7.8 0.3 0:18.40 zabbix_server
      4597 zabbix 25 5 116m 78m 1748 S 7.8 0.3 0:19.51 zabbix_server
      4601 zabbix 25 5 116m 78m 1716 S 7.8 0.3 0:16.81 zabbix_server
      4598 zabbix 25 5 108m 72m 1744 S 7.2 0.0 0:14.79 zabbix_server
      4599 zabbix 25 5 100m 66m 1720 S 6.6 0.3 0:19.27 zabbix_server
      4240 mysql 20 0 134m 46m 5572 S 4.6 21.3 21:55.29 mysqld
      21424 www-data 20 0 30144 13m 3636 S 1.3 0.7 0:11.95 apache2
      21314 www-data 20 0 30112 13m 3636 S 1.3 0.0 0:13.69 apache2
      21423 www-data 20 0 29868 13m 3704 S 1.3 0.0 0:13.65 apache2
      21540 www-data 20 0 29580 12m 3636 S 1.3 1.0 0:12.11 apache2
      21417 www-data 20 0 29324 12m 3704 S 1.2 0.0 0:11.43 apache2
      21359 www-data 20 0 29152 12m 3704 S 1.2 1.0 0:15.32 apache2
      21549 www-data 20 0 28400 11m 3616 S 1.2 0.0 0:12.45 apache2
      21311 root 20 0 23496 6944 3932 S 0.7 0.0 0:00.09 apache2
      4406 snmp 20 0 8492 3728 2092 S 0.4 0.0 0:00.09 snmpd
      4637 zabbix 25 5 10432 2956 1828 S 0.3 0.3 0:10.20 zabbix_server
      4538 root 20 0 5008 2716 1448 S 0.3 0.0 0:00.36 bash
      4622 zabbix 25 5 10372 2536 1460 S 0.2 0.0 0:00.72 zabbix_server
      Attached Files

      Comment

      • hughmcl
        Junior Member
        • Oct 2008
        • 20

        #18
        Here's some info that may help?

        I ran valgrind on the zabbix_server with memcheck and watched the processes that grew and i've grepped the log for those processes so here is info from them:-

        ==16130==
        ==16130== ERROR SUMMARY: 117 errors from 1 contexts (suppressed: 51 from 1)
        ==16130== malloc/free: in use at exit: 21,007,250 bytes in 15,270 blocks.
        ==16130== malloc/free: 246,998 allocs, 231,728 frees, 596,280,150 bytes allocated.
        ==16130== For counts of detected errors, rerun with: -v
        ==16130== searching for pointers to 15,270 not-freed blocks.
        ==16130== checked 1,453,624 bytes.
        ==16130==
        ==16130==
        ==16130== 18,840 bytes in 942 blocks are definitely lost in loss record 78 of 86
        ==16130== at 0x40053C0: malloc (vg_replace_malloc.c:149)
        ==16130== by 0x258AA1: netsnmp_udp_transport (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16130== by 0x259130: netsnmp_udp_create_tstring (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16130== by 0x250929: netsnmp_tdomain_transport (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16130== by 0x22156C: snmp_sess_open (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16130== by 0x22181C: snmp_open (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16130== by 0x80519EA: get_snmp (checks_snmp.c:476)
        ==16130== by 0x805229E: get_value_snmp (checks_snmp.c:743)
        ==16130== by 0x8055052: get_value (poller.c:69)
        ==16130== by 0x8055446: main_poller_loop (poller.c:448)
        ==16130== by 0x804E146: MAIN_ZABBIX_ENTRY (server.c:1142)
        ==16130== by 0x8068BB3: daemon_start (daemon.c:190)
        ==16130==
        ==16130==
        ==16130== 18,091,936 (123,336 direct, 17,968,600 indirect) bytes in 1,250 blocks are definitely lost in loss record 84 of 86
        ==16130== at 0x40053C0: malloc (vg_replace_malloc.c:149)
        ==16130== by 0x41CE82C: my_malloc (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16130== by 0x41EAAB0: mysql_store_result (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16130== by 0x8075545: zbx_db_vselect (db.c:718)
        ==16130== by 0x8072120: __zbx_DBselect (db.c:246)
        ==16130== by 0x8072240: zbx_host_key_string (db.c:2198)
        ==16130== by 0x808212C: update_triggers (functions.c:161)
        ==16130== by 0x8055645: main_poller_loop (poller.c:480)
        ==16130== by 0x804E146: MAIN_ZABBIX_ENTRY (server.c:1142)
        ==16130== by 0x8068BB3: daemon_start (daemon.c:190)
        ==16130== by 0x804DD40: main (server.c:974)
        ==16130==
        ==16130==
        ==16130== 2,498,332 bytes in 308 blocks are possibly lost in loss record 86 of 86
        ==16130== at 0x40053C0: malloc (vg_replace_malloc.c:149)
        ==16130== by 0x41CE82C: my_malloc (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16130== by 0x41D1989: alloc_root (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16130== by 0x41EB8DE: cli_read_rows (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16130== by 0x41EAAED: mysql_store_result (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16130== by 0x8075545: zbx_db_vselect (db.c:718)
        ==16130== by 0x8072120: __zbx_DBselect (db.c:246)
        ==16130== by 0x8072240: zbx_host_key_string (db.c:2198)
        ==16130== by 0x808216C: update_triggers (functions.c:166)
        ==16130== by 0x8055645: main_poller_loop (poller.c:480)
        ==16130== by 0x804E146: MAIN_ZABBIX_ENTRY (server.c:1142)
        ==16130== by 0x8068BB3: daemon_start (daemon.c:190)
        ==16130==
        ==16130== LEAK SUMMARY:
        ==16130== definitely lost: 142,176 bytes in 2,192 blocks.
        ==16130== indirectly lost: 17,968,600 bytes in 3,442 blocks.
        ==16130== possibly lost: 2,498,332 bytes in 308 blocks.
        ==16130== still reachable: 398,142 bytes in 9,328 blocks.
        ==16130== suppressed: 0 bytes in 0 blocks.
        ==16130== Reachable blocks (those to which a pointer was found) are not shown.
        ==16130== To see them, rerun with: --show-reachable=yes



        ==16131==
        ==16131== ERROR SUMMARY: 117 errors from 1 contexts (suppressed: 51 from 1)
        ==16131== malloc/free: in use at exit: 20,851,225 bytes in 15,221 blocks.
        ==16131== malloc/free: 245,132 allocs, 229,911 frees, 591,987,224 bytes allocated.
        ==16131== For counts of detected errors, rerun with: -v
        ==16131== searching for pointers to 15,221 not-freed blocks.
        ==16131== checked 1,527,540 bytes.
        ==16131==
        ==16131==
        ==16131== 18,560 bytes in 928 blocks are definitely lost in loss record 91 of 99
        ==16131== at 0x40053C0: malloc (vg_replace_malloc.c:149)
        ==16131== by 0x258AA1: netsnmp_udp_transport (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16131== by 0x259130: netsnmp_udp_create_tstring (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16131== by 0x250929: netsnmp_tdomain_transport (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16131== by 0x22156C: snmp_sess_open (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16131== by 0x22181C: snmp_open (in /usr/lib/libnetsnmp.so.10.0.1)
        ==16131== by 0x80519EA: get_snmp (checks_snmp.c:476)
        ==16131== by 0x805229E: get_value_snmp (checks_snmp.c:743)
        ==16131== by 0x8055052: get_value (poller.c:69)
        ==16131== by 0x8055446: main_poller_loop (poller.c:448)
        ==16131== by 0x804E146: MAIN_ZABBIX_ENTRY (server.c:1142)
        ==16131== by 0x8068BB3: daemon_start (daemon.c:190)
        ==16131==
        ==16131==
        ==16131== 17,844,852 (121,952 direct, 17,722,900 indirect) bytes in 1,236 blocks are definitely lost in loss record 95 of 99
        ==16131== at 0x40053C0: malloc (vg_replace_malloc.c:149)
        ==16131== by 0x41CE82C: my_malloc (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16131== by 0x41EAAB0: mysql_store_result (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16131== by 0x8075545: zbx_db_vselect (db.c:718)
        ==16131== by 0x8072120: __zbx_DBselect (db.c:246)
        ==16131== by 0x8072240: zbx_host_key_string (db.c:2198)
        ==16131== by 0x808216C: update_triggers (functions.c:166)
        ==16131== by 0x8055645: main_poller_loop (poller.c:480)
        ==16131== by 0x804E146: MAIN_ZABBIX_ENTRY (server.c:1142)
        ==16131== by 0x8068BB3: daemon_start (daemon.c:190)
        ==16131== by 0x804DD40: main (server.c:974)
        ==16131==
        ==16131==
        ==16131== 2,506,452 bytes in 309 blocks are possibly lost in loss record 98 of 99
        ==16131== at 0x40053C0: malloc (vg_replace_malloc.c:149)
        ==16131== by 0x41CE82C: my_malloc (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16131== by 0x41EB854: cli_read_rows (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16131== by 0x41EAAED: mysql_store_result (in /usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0)
        ==16131== by 0x8075545: zbx_db_vselect (db.c:718)
        ==16131== by 0x8072120: __zbx_DBselect (db.c:246)
        ==16131== by 0x8072190: zbx_host_key_function_string (db.c:2222)
        ==16131== by 0x807523E: DBget_function_result (db.c:300)
        ==16131== by 0x8089529: evaluate_expression (expression.c:1688)
        ==16131== by 0x8082118: update_triggers (functions.c:159)
        ==16131== by 0x8055645: main_poller_loop (poller.c:480)
        ==16131== by 0x804E146: MAIN_ZABBIX_ENTRY (server.c:1142)
        ==16131==
        ==16131== LEAK SUMMARY:
        ==16131== definitely lost: 140,512 bytes in 2,164 blocks.
        ==16131== indirectly lost: 17,722,900 bytes in 3,397 blocks.
        ==16131== possibly lost: 2,506,452 bytes in 309 blocks.
        ==16131== still reachable: 481,361 bytes in 9,351 blocks.
        ==16131== suppressed: 0 bytes in 0 blocks.
        ==16131== Reachable blocks (those to which a pointer was found) are not shown.
        ==16131== To see them, rerun with: --show-reachable=yes



        So do I take it that it's an issue with the mysqlclient? or is it an issue with zabbix_server talking to the client?

        I'm using mysql-5.0.67.


        Hugh

        Comment

        • hughmcl
          Junior Member
          • Oct 2008
          • 20

          #19
          Any updates?

          I'm having to restart my zabbix_server process every 2-3 hours.

          Zabbix engineers/developers find anything useful with the information i posted?

          Comment

          • pace
            Junior Member
            • Oct 2008
            • 7

            #20
            1. DBStartSyncers: Default
            2. Output of ps for zabbix_server after running for about 1 hour:

            zabbix 4885 0.0 0.1 10676 2040 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4887 0.0 8.1 127992 85440 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4888 0.0 7.9 125132 83312 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4889 0.0 7.8 123992 82704 ? SN 00:51 0:01 /usr/local/sbin/zabbix_server
            zabbix 4890 0.0 8.0 127084 84872 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4891 0.0 8.2 128780 86000 ? SN 00:51 0:01 /usr/local/sbin/zabbix_server
            zabbix 4892 0.0 0.1 10696 1612 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4893 0.0 0.1 10696 1612 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4894 0.0 0.1 10696 1612 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4895 0.0 0.1 10696 1612 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4896 0.0 0.1 10696 1612 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4897 0.0 0.1 10680 1784 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4898 0.0 0.1 10676 1512 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4899 0.0 0.1 10676 1584 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4900 0.0 0.1 10676 1524 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4901 0.0 0.2 12980 2620 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4902 0.0 0.1 10676 1524 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4903 0.0 0.2 10760 2456 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4904 0.0 0.2 10760 2444 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4905 0.0 0.2 10760 2508 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4906 0.0 0.3 11244 3452 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4907 0.0 0.3 11252 3408 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4908 0.0 0.2 12980 2612 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server
            zabbix 4909 0.0 0.1 10676 1528 ? SN 00:51 0:00 /usr/local/sbin/zabbix_server

            3. Types of monitors: snmp, agent, agent-active, and simple checks

            Leaking like a siv. Just for giggles, I updated all of my development packages to the latest and recompiled. That solved nothing.


            pace

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #21
              The valgrind output was extremely useful. Thanks for this!

              We have identified reason of the memory leaks, we do not free small chunks memory in a couple of places. The memory leaks happens ONLY in case if a poller cannot get information from Zabbix/SNMP Agent, one leak per each fail. That was the reason why it took so long to figure it out.

              Please expect a patch shortly.

              Thanks again to all participating in the bug hunting!
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • richlv
                Senior Member
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Oct 2005
                • 3112

                #22
                nice to see this one resolved. any information on which version introduced this problem ?
                Zabbix 3.0 Network Monitoring book

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #23
                  Originally posted by richlv
                  nice to see this one resolved. any information on which version introduced this problem ?
                  It was introduced by some improvements we've made in 1.6.2.
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • hughmcl
                    Junior Member
                    • Oct 2008
                    • 20

                    #24
                    Woohoo! Glad my valgrind dumps helped. Looking forward to the patches.


                    Hugh

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #25
                      Here is the patch for 1.6.2. Thanks to all again!
                      Attached Files
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      • hughmcl
                        Junior Member
                        • Oct 2008
                        • 20

                        #26
                        Been using the patch now for 20+ minutes and don't see any increased memory usage! Looks like this fixed it. Thanks.


                        Hugh

                        Comment

                        • elvar
                          Senior Member
                          • Feb 2008
                          • 226

                          #27
                          Originally posted by hughmcl
                          Been using the patch now for 20+ minutes and don't see any increased memory usage! Looks like this fixed it. Thanks.


                          Hugh

                          Yes, I can confirm the same thing. It's so nice having this working.

                          Comment

                          • ggiesen
                            Junior Member
                            • Jan 2009
                            • 7

                            #28
                            This has fixed the problem for me as well. I can now kill my cron job that restarts zabbix every two hours to avoid running out of memory.

                            Comment

                            • kmitbo
                              Junior Member
                              • Jan 2009
                              • 3

                              #29
                              Patch from post #25 fixed problem.
                              Thank you!

                              Comment

                              • elvar
                                Senior Member
                                • Feb 2008
                                • 226

                                #30
                                Shouldn't a notice be attached to the "1.6.2 stable" link in the download section to this patch? Can 1.6.2 really be considered stable with a known memory leak?

                                Comment

                                Working...