Ad Widget

Collapse

zabbix_server process quit automatically every 40 days

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jing
    Junior Member
    • Oct 2010
    • 9

    #1

    zabbix_server process quit automatically every 40 days

    Zabbix is a excellent distribution monitoring solution. We deployed a kind of this distribution monitoring environment with two zabbix proxies running on two remote sites. It works well at the very beginning. But unfortunately after first successful month the zabbix_server process quit almost every 40 days, very regularly. We also deployed other zabbix monitoring without proxy. They runs perfectly, no zabbix_server process quit. I suspect it’s a bug of zabbix 1.8.3 isn’t it? I monitoring the zabbix server, found that the memory usage is weird. It seems memory leak. Any suggestion?

    Zabbix_server version
    Zabbix Server v1.8.3 (revision 13093) (29 March 2010)
    Compilation time: Jun 29 2010 15:41:27

    Zabbix_proxy version
    Zabbix Proxy v1.8.3 (revision 13093) (29 March 2010)
    Compilation time: Aug 26 2010 11:52:11
    Attached Files
  • jing
    Junior Member
    • Oct 2010
    • 9

    #2
    Does anyone can help me?

    Comment

    • nelsonab
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2006
      • 1233

      #3
      Have you tried 1.8.4?

      From the information given yep I think it's hard to disagree that something is consuming your memory, but it's not possible to say which process it is however.

      When this does happen what are some of the messages in your log files? Does Zabbix show any errors in it's log file? Also what are the top memory consumers on your system at that time?
      ps -auxf | sort -nr -k 4 | head -10

      You have also not given any information about how you installed Zabbix and which distribution you are using.

      Give us some of those details and maybe someone can help, otherwise there is paid support available and they can help you more quickly get to the bottom of your issue.
      RHCE, author of zbxapi
      Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
      Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

      Comment

      • jing
        Junior Member
        • Oct 2010
        • 9

        #4
        Nelsonab, thank you very much for your reply. I’m sure the process which is consuming most of memory is zabbix_server. I checked the memory status every 2-3 days interval during last several months. Actually, the memory consumed by zabbix_server process grows continuously until run out the memory.
        [root@Zabbix ~]# ps auxf | sort -nr -k 4 | head -10
        zabbix 6560 0.2 52.8 2119224 1097008 ? SN Jan13 81:43 \_ /usr/local/sbin/zabbix_server
        mysql 11082 2.9 27.5 781848 571040 ? Sl 2010 1915:23 \_ /usr/local/libexec/mysqld --basedir=/usr/local --datadir=/zabbixdata/db/ --user=mysql --pid-file=/zabbixdata/mysql.pid --skip-external-locking --port=3306 --socket=/zabbixdata/mysql.sock
        nobody 25618 0.0 0.5 29708 12068 ? S Jan28 0:01 \_ /usr/local/apache2/bin/httpd -k start
        nobody 25499 0.0 0.5 27868 11032 ? S Jan28 0:01 \_ /usr/local/apache2/bin/httpd -k start
        nobody 24843 0.0 0.5 29772 11432 ? S Jan28 0:02 \_ /usr/local/apache2/bin/httpd -k start
        nobody 23979 0.0 0.5 29752 10904 ? S Jan28 0:03 \_ /usr/local/apache2/bin/httpd -k start
        nobody 25552 0.0 0.4 29732 9408 ? S Jan28 0:01 \_ /usr/local/apache2/bin/httpd -k start
        nobody 29481 0.0 0.3 29424 6984 ? S Jan17 0:08 \_ /usr/local/apache2/bin/httpd -k start
        nobody 28406 0.0 0.3 23380 7076 ? S Jan29 0:00 \_ /usr/local/apache2/bin/httpd -k start
        nobody 25233 0.0 0.3 23664 6968 ? S Jan28 0:00 \_ /usr/local/apache2/bin/httpd -k start
        the zabbix version is as follow,
        Zabbix_server version
        Zabbix Server v1.8.3 (revision 13093) (29 March 2010)
        Compilation time: Jun 29 2010 15:41:27

        Zabbix_proxy version
        Zabbix Proxy v1.8.3 (revision 13093) (29 March 2010)
        Compilation time: Aug 26 2010 11:52:11
        The Linux OS is CentOS 5.5 32bit.
        I also didn’t find any error information in zabbix_server.log and linux message log. It’s really weird. Here’s part of the zabbix_server.log.
        6560:20110201:215631.630 Sending configuration data to proxy 'Proxy_NI'. Datalen 63170
        6560:20110201:215633.047 Sending configuration data to proxy 'Proxy_21Vianet'. Datalen 42977
        6546:20110201:215643.181 NODE 10: Sending history_sync of node 10 to node 1 datalen 2457
        6546:20110201:215643.236 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1192
        6546:20110201:215653.285 NODE 10: Sending history_sync of node 10 to node 1 datalen 2394
        6546:20110201:215653.354 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1646
        6546:20110201:215703.412 NODE 10: Sending history_sync of node 10 to node 1 datalen 2230
        6546:20110201:215703.453 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1659
        6546:20110201:215713.554 NODE 10: Sending history_sync of node 10 to node 1 datalen 2335
        6546:20110201:215713.604 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1988
        6546:20110201:215723.765 NODE 10: Sending history_sync of node 10 to node 1 datalen 2459
        6546:20110201:215723.827 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1975
        6546:20110201:215733.932 NODE 10: Sending history_sync of node 10 to node 1 datalen 2421
        6546:20110201:215733.984 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1903
        6546:20110201:215743.107 NODE 10: Sending history_sync of node 10 to node 1 datalen 2675
        6546:20110201:215743.169 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1563
        6546:20110201:215753.259 NODE 10: Sending history_sync of node 10 to node 1 datalen 2719
        6546:20110201:215753.316 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1738
        6546:20110201:215803.378 NODE 10: Sending history_sync of node 10 to node 1 datalen 2190
        6546:20110201:215803.515 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1107
        6546:20110201:215813.677 NODE 10: Sending history_sync of node 10 to node 1 datalen 3274
        6546:20110201:215813.745 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1280
        6546:20110201:215823.828 NODE 10: Sending history_sync of node 10 to node 1 datalen 2483
        6546:20110201:215823.879 NODE 10: Sending history_uint_sync of node 10 to node 1 datalen 1169
        6546:20110201:215836.759 NODE 10: Sending configuration changes to master node 1 for node 10 datalen 108
        6546:20110201:215840.970 NODE 10: Received configuration changes from master node 1 for node 10 datalen 9

        Node 10 is local zabbix. Node 1 is central zabbix node. Proxy_NI and Proxy_21Vianet are two remote site zabbix proxy.
        BTW, I deployed the second zabbix server in the same way, actually I cloned the first VM to second VM. it configured without zabbix proxy and it run well. So I don’t think it’s to do with installation method, do you?
        I’m not sure it will be OK if I update the version to 1.8.4. Pay for zabbix official support must be the only option if there’s no resolution eventually. But before it, I really need you suggestion.
        Thank you.

        Comment

        • anrstone
          Member
          • Oct 2009
          • 61

          #5
          we're also running Zabbix 1.8.3 on Ubuntu with a Postgres backend (separate DB, Server and I/F boxes). We have been having issues with the DB appearing to fail but now I've read this it may be that that was a red herring as we also appear to have a memory leak. When running Top over a period we see the memory per Zabbix process double over about 24 hours and the trend is only up - sadly this means we will now need to restart the box every 2 / 3 days.

          We believe this may be due to the number of measurement points as we have recently increased the amount we're monitoring by a factor of about 30%. Previously we ran for around 2 months before any issue and when we did have problems there were DB Fatal error warnings at the same time so we assumed that was the problem.

          can anyone shed light on what might be going on?

          Comment

          • bturnbough
            Member
            • Mar 2011
            • 70

            #6
            Zabbix becoming unresponsive...

            I also have noticed that Zabbix_server stops responding after some time. I kinda worked around it by restarting the zabbix_server daily in the shell script that I wrote to backup the MYSQL backend database.

            1) stop apache
            2) stop zabbix_server
            3) mysqldump
            4) start zabbix_server
            5) start apache

            Comment

            • anrstone
              Member
              • Oct 2009
              • 61

              #7
              So the way we worked around this was to add a shed load of extra RAM to the server. This probably will not solve the issue for ever but will stop the issue we currently have of daily restarts. I can't be certain but it looks like the issue may be due to paging which causes the zabbix server to slow horribly (as you'd expect). though I can't really understand why it is paging as it should have enough RAM to handle the number of instances of zabbix server running on the box...

              To be fair it seems to only happen when we ramp up the number of items we capture using zabbix_trapper so it may be that rather than the basic system that's at fault.

              Comment

              • jing
                Junior Member
                • Oct 2010
                • 9

                #8
                Eventually I change proxy mode from passive to active and then the problem was gone.

                Comment

                Working...