Ad Widget

Collapse

Possible memory leak in zabbix 1.8.1 on CenOS 5.4 Final

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Murilex
    Senior Member
    • Nov 2009
    • 124

    #1

    Possible memory leak in zabbix 1.8.1 on CenOS 5.4 Final

    I updated zabbix from version 1.6.7 to 1.8.1 one month ago. Since of then, I realize an increasing memory consume by zabbix server process (see screenshot 1). As a result, I have to restart zabbix server process two or three times a day. I've looked for similar problem on the Internet and I found only one thread that made sense: http://www.zabbix.com/forum/showthread.php?t=7126.
    Let me now describe some aspects of my zabbix configuration:
    Status of Zabbix
    Parameter Value Details
    Zabbix server is running Yes -
    Number of hosts (monitored/not monitored/templates) 258 185 / 8 / 65
    Number of items (monitored/disabled/not supported) 9956 1817 / 8092 / 47
    Number of triggers (enabled/disabled)[true/unknown/false] 1003 978 / 25 [73 / 0 / 905]
    Number of users (online) 23 1
    Required server performance, new values per second 20 -

    I've compiled zabbix with the following options:
    Code:
    ./configure --prefix=/usr/local/zabbix --with-pgsql --with-ldap --enable-server --enable-agent --with-jabber --with-net-snmp --with-libcurl
    Average CPU utilization of zabbix server machine is less than 5%.
    Average CPU utilization of database machine is less than 4%.
    There are no web monitoring and agent zabbix active checks enabled.
    There are several snmp, calculated and passive check based items enabled.
    After I upgrade zabbix to version 1.8.1 I was using mysql database and the same memory problem was observed.

    Follow now some of softwares installed on my CentOS box (where zabbix server is installed). See screenshot 2 for some other details:
    postgresql-8.1.18-2.el5_4.1 | postgresql-devel-8.1.18-2.el5_4.1 | postgresql-libs-8.1.18-2.el5_4.1 | postgresql-devel-8.1.18-2.el5_4.1
    curl-7.15.5-2.1.el5_3.5 | curl-devel-7.15.5-2.1.el5_3.5
    php-5.3.1-1.el5.remi | php-pgsql-5.3.1-1.el5.remi | php-gd-5.3.1-1.el5.remi | php-common-5.3.1-1.el5.remi | php-pdo-5.3.1-1.el5.remi| php-ldap-5.3.1-1.el5.remi | php-bcmath-5.3.1-1.el5.remi | php-xml-5.3.1-1.el5.remi | php-cli-5.3.1-1.el5.remi |php-mbstring-5.3.1-1.el5.remi
    iksemel-1.2-13.el5 | iksemel-devel-1.2-13.el5
    libxml2-2.6.26-2.1.2.8 | libxml2-devel-2.6.26-2.1.2.8 | libxml2-python-2.6.26-2.1.2.8
    openldap-2.3.43-3.el5 | openldap-devel-2.3.43-3.el5
    net-snmp.x86_64 1:5.3.2.2-7.el5_4.2 | net-snmp-devel.i386 1:5.3.2.2-7.el5_4.2 | net-snmp-libs.i386 1:5.3.2.2-7.el5_4.2 | net-snmp-utils.x86_64 1:5.3.2.2-7.el5_4.2

    Considering the thread mentioned before, I've started to monitor my snmp pollers and they are probably the responsible for the excessive memory consume. See screenshots 3, 4 and 5 for some measures that prove my suspects. As suggested by the thread, I've update my snmp software by compiling net-snmp 5.5 sources (the last binaries version available for CentOS is 5.3.2). Then I recompiled zabbix and installed new generated binaries (server and agent). Unfortunately, the memory leak problem persists. Is this a known problem on CentOS? Has anyone faced the same problem?
    Attached Files
  • danrog
    Senior Member
    • Sep 2009
    • 164

    #2
    We run RedHat EL5.4 for our server and have ~8x the number of hosts (and growing) without issues. It looks like we are running the same version of most of the packages (including Remi's php) however we are running mysql not postgres (could be something there). The zabbix server processes on our system have been running since 1.8.1 was released. How fast are you polling items (default 30 secs or did you bump then up)? Do you have a lot of passive or active items? Also, what is the load avg for the zabbix server (just curious)?

    Comment

    • Murilex
      Senior Member
      • Nov 2009
      • 124

      #3
      How fast are you polling items (default 30 secs or did you bump then up)?
      Most of them have default polling frequency (30s). But I've increased the frequency of the most cpu intensive ones.

      Do you have a lot of passive or active items? Also, what is the load avg for the zabbix server (just curious)?
      I have a lot of passive and snmp items. The average server load is less than 0.5.

      My zabbix server processes have been running since 1.8.1 was released too. As I said, I was using mysql after zabbix upgrade and the memory leak was an issue. To tell you the truth, I changed my database to postgresql thinking that it could resolve the memory leak problem.

      Comment

      • blfgomes
        Junior Member
        Zabbix Certified Specialist
        • Nov 2009
        • 10

        #4
        Originally posted by Murilex
        Considering the thread mentioned before, I've started to monitor my snmp pollers and they are probably the responsible for the excessive memory consume. See screenshots 3, 4 and 5 for some measures that prove my suspects. As suggested by the thread, I've update my snmp software by compiling net-snmp 5.5 sources (the last binaries version available for CentOS is 5.3.2). Then I recompiled zabbix and installed new generated binaries (server and agent). Unfortunately, the memory leak problem persists. Is this a known problem on CentOS? Has anyone faced the same problem?
        We are running Zabbix 1.8.1 on CentOS 5.3 with the exact same problem. I traced it down to the snmp pollers as well and was thinking of trying a newer version of net-snmp, but since you already did that to no avail, I'm guessing it's probably a bug in the Zabbix code.

        Comment

        • blfgomes
          Junior Member
          Zabbix Certified Specialist
          • Nov 2009
          • 10

          #5
          Memory leak in calculated items

          Ok, here's the valgrind output:

          Code:
          ==31086==
          ==31086==
          [B]==31086== 134,144 bytes in 8 blocks are possibly lost in loss record 78 of 80[/B]
          ==31086==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
          ==31086==    by 0x4A05883: realloc (vg_replace_malloc.c:306)
          ==31086==    by 0x433CF4: zbx_realloc2 (comms.c:161)
          [B]==31086==    by 0x410CAA: get_value_calculated (checks_calculated.c:77)[/B]
          ==31086==    by 0x40D75F: main_poller_loop (poller.c:182)
          ==31086==    by 0x40773F: MAIN_ZABBIX_ENTRY (server.c:550)
          ==31086==    by 0x430026: daemon_start (daemon.c:192)
          ==31086==    by 0x33C3A1D973: (below main) (in /lib64/libc-2.5.so)
          ==31086==
          ==31086==
          [B]==31086== 989,312 bytes in 59 blocks are definitely lost in loss record 80 of 80[/B]
          ==31086==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
          ==31086==    by 0x4A05883: realloc (vg_replace_malloc.c:306)
          ==31086==    by 0x433CF4: zbx_realloc2 (comms.c:161)
          [B]==31086==    by 0x410CAA: get_value_calculated (checks_calculated.c:77)[/B]
          ==31086==    by 0x40D75F: main_poller_loop (poller.c:182)
          ==31086==    by 0x40773F: MAIN_ZABBIX_ENTRY (server.c:550)
          ==31086==    by 0x430026: daemon_start (daemon.c:192)
          ==31086==    by 0x33C3A1D973: (below main) (in /lib64/libc-2.5.so)
          ==31086==
          ==31086== LEAK SUMMARY:
          ==31086==    definitely lost: 989,312 bytes in 59 blocks.
          ==31086==      possibly lost: 134,144 bytes in 8 blocks.
          ==31086==    still reachable: 717,216 bytes in 10,634 blocks.
          ==31086==         suppressed: 0 bytes in 0 blocks.
          ==31086== Reachable blocks (those to which a pointer was found) are not shown.
          ==31086== To see them, rerun with: --show-reachable=yes
          Murilex, are you using calculated items?

          Comment

          • wax66
            Junior Member
            • Apr 2009
            • 27

            #6
            We have the same issue here, running Centos 5.2, mysql, and Zabbix (whatever version we've ran).

            Even though this isn't good behavior for *something*, I wonder, is it dipping into your swap and causing performance issues? Ours basically just chews up our 16 GB and then sits there. It's done that in all the 1.6es that I noticed and now the 1.8s. We really don't have any performance degradation, though.

            If it's not really causing any problems, and you don't use the server for anything else, you may not need to worry about it (though it's good you brought it up, as I thought I was the only one with the problem).

            Just my 2 cents.
            -Ron

            Here's a couple of screens of my free memory graphs:

            Click image for larger version

Name:	monitormemory.jpg
Views:	1
Size:	26.0 KB
ID:	308810

            Click image for larger version

Name:	monitordalmemory.jpg
Views:	1
Size:	28.0 KB
ID:	308811

            It's been a few weeks since I've rebooted, so the data is sorta old, but as you can see, our master server gets its memory chewed up quick, where one of our new slaves which has a ton more memory gets its chewed up pretty slowly.
            Last edited by wax66; 25-02-2010, 17:26. Reason: Added graphs.

            Comment

            • Murilex
              Senior Member
              • Nov 2009
              • 124

              #7
              Murilex, are you using calculated items?
              Yes blfgomes, I'm using calculated items. Almost all my host have at least one calculated item.

              Even though this isn't good behavior for *something*, I wonder, is it dipping into your swap and causing performance issues?
              Yes wax66, after consume all the physical memory, zabbix server processes start to consume swap memory and performance becomes an issue. And more, after server machine run out of swap, zabbix server processes die! This is why I'm automatically restarting zabbix server processes once in a while.

              Comment

              • blfgomes
                Junior Member
                Zabbix Certified Specialist
                • Nov 2009
                • 10

                #8
                Originally posted by Murilex
                Yes blfgomes, I'm using calculated items. Almost all my host have at least one calculated item.



                Yes wax66, after consume all the physical memory, zabbix server processes start to consume swap memory and performance becomes an issue. And more, after server machine run out of swap, zabbix server processes die! This is why I'm automatically restarting zabbix server processes once in a while.
                Here is a patch that I think solves the issue. You will have to apply it and then recompile the server binary. I tried it in our production system and valgrind didn't report the memory leak anymore. We were experiencing out of memory problems as well because of this bug.

                Code:
                $ patch zabbix-1.8.1/src/zabbix_server/poller/checks_calculated.c < checks_calculated.patch
                Could you please give it a try and post the results?

                Zabbix developers, would you be kind enough to review the patch?
                Attached Files

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #9
                  Thank you for the patch and for all information in the thread. It really helps!

                  The issue is in 1.8.2 roadmap, it will be fixed shortly. At the moment I cannot confirm that the patch is correct, however it looks very much so.

                  Check progress here:

                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • Murilex
                    Senior Member
                    • Nov 2009
                    • 124

                    #10
                    Thanks blfgomes! For sure I will try your patch as soon as possible and post the results.

                    Comment

                    • Murilex
                      Senior Member
                      • Nov 2009
                      • 124

                      #11
                      Thanks again blfgomes. Memory leak problem solved!

                      Comment

                      • blfgomes
                        Junior Member
                        Zabbix Certified Specialist
                        • Nov 2009
                        • 10

                        #12
                        Originally posted by Murilex
                        Thanks again blfgomes. Memory leak problem solved!
                        Thank you for the detailed report in the first place, Murilex! If I had not found your post, maybe I would not have taken the time to investigate the issue myself.

                        Comment

                        • fjrial
                          Senior Member
                          • Feb 2010
                          • 140

                          #13
                          on the same situation (centos 5.4 final + 1.8.1) and calculated items

                          [Edited]
                          It works fine now!...

                          Zabbix_server working for ~30 hours and everything it's ok
                          [/Edited]

                          First, thanks for this patch, I'll try it in a few moments.

                          I was experiencing the some problems.. zabbix_server begun to eat the whole ram and swap and afther that, oomkiller killed the zabbix_server process, and yes, we are using calculated items, before that, we didn't have any problem.

                          After I've tried the patch, I'll post my conclusions
                          Thanks again.
                          Last edited by fjrial; 03-03-2010, 15:52.

                          Comment

                          • blfgomes
                            Junior Member
                            Zabbix Certified Specialist
                            • Nov 2009
                            • 10

                            #14
                            Originally posted by fjrial
                            It works fine now!...

                            Zabbix_server working for ~30 hours and everything it's ok
                            Thank you for the feedback!

                            Comment

                            Working...