Ad Widget

Collapse

Zabbix Server Crashing due to VMware collector?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Isuress
    Junior Member
    • Nov 2018
    • 5

    #1

    Zabbix Server Crashing due to VMware collector?

    Hello, I've been using Zabbix in our production environment for 3-4 years now; fantastic free software, love it.
    I've never really had too many issues with it unless I tried updating to the newer versions of Ubuntu.

    That said, I recently updated our Zabbix instance to v4.0.1 from v3.5 and have run into some problems.
    From what I'm reading, it's being caused by the VMware collector running out of cache space.
    The default cache amount in the Zabbix config is 8M; I've since upgraded that to 24M and it's still crashing.
    I've never had an issue with caching in Zabbix up until I upgraded to v4.0.

    I don't know if there's a certain scheme for posting crashlogs here but here's is a link to a pastebin with the crashlog:
    Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

    I've sanitized the logs with XXXXXX to hide small things.

    While Googling the issue, I've found other threads mentioning to check the Graph's tab to see how much Cache is being used.
    What's makes this crash very weird is according to the cache graphs, there's SO much cache free.
    Even when the VMware Collector cache was at 8M, there was still cache leftover; so I don't understand what's causing the crash.

    An image of our cache use graphs can be found in the image section.
  • Isuress
    Junior Member
    • Nov 2018
    • 5

    #2
    I've had a fellow IT friend look at this issue with me over the course of 2 hours and we were still unable to resolve this issue.
    I would really prefer to not have to revert back to v3 over v4. Any assistance on this issue would be highly appreciated.

    Comment

    • vso
      Zabbix developer
      • Aug 2016
      • 190

      #3
      Have you tried this action? «please increase VMwareCacheSize configuration parameter»

      Comment

      • Isuress
        Junior Member
        • Nov 2018
        • 5

        #4
        Originally posted by vso
        Have you tried this action? «please increase VMwareCacheSize configuration parameter»
        Yes, that was the first thing I attempted; I mention this in my initial post:

        Originally posted by Isuress
        From what I'm reading, it's being caused by the VMware collector running out of cache space.
        The default cache amount in the Zabbix config is 8M; I've since upgraded that to 24M and it's still crashing.
        I was using 8M prior to the upgrade without issue.
        Only after updating to v4.0 did this issue arise.
        That said, if you continue to read the post; I mention that even at 8M the cache is still mostly unused? At least according to the graphs.
        Which makes the fact that it's asking for more cache strange.

        Unless you're suggesting that I should give the cache something well above 24M? Was v4.0 that much of a substantial rewrite that its VMware requirements for cache increased from 8M to well above 50M?
        I'm not trying to be sardonic; I am genuinely curious.

        Comment

        • vso
          Zabbix developer
          • Aug 2016
          • 190

          #5
          Regarding graph, how big is the update interval and how long does it take for Zabbix server to exit with out of memory errror ? Cache can be increased up to 2 GB and you can check if it stabilizes somewhere in between.

          I am very curious as to what exactly consumes so much memory, unfortunately it might require patching to add additional debug.

          It might have been caused by https://support.zabbix.com/browse/ZBX-14548

          Comment

          • Isuress
            Junior Member
            • Nov 2018
            • 5

            #6
            Originally posted by vso
            Regarding graph, how big is the update interval and how long does it take for Zabbix server to exit with out of memory errror ? Cache can be increased up to 2 GB and you can check if it stabilizes somewhere in between.
            I am using the default template for the Zabbix statistics. Looking at the item values it says it checks every 1m for "Zabbix vmware cache, % free" and any other Zabbix cache related items.
            According to the pastebin I've linked, the server was restarted at 133353 (1:33pm) and then the error started to occur at 143129 (2:31pm). I can attest that it takes 45 minutes to 1 hour for the server to "run out of memory".
            That said, accord to HTOP the CPU, RAM and Swap space used is very minimal throughout Zabbix's usage. Below I've posted an image of Zabbix's typical usage; it's typically somewhat lower than this as well.
            Click image for larger version

Name:	1nKnisp.png
Views:	544
Size:	3.0 KB
ID:	368912

            It's near the end of the day (10 minutes) at the office; I'll increase the cache size for it before I leave and check the logs tomorrow. I hope it wont require more than 64M.

            Originally posted by vso
            I am very curious as to what exactly consumes so much memory, unfortunately it might require patching to add additional debug.
            I would like to help but this is our production Zabbix. I'm curious as to why this is happening also.
            Me and my IT friend attempted to increase the verbose log from 3 to 5 and then 4 but there was WAY too much data being written to comb through it all for it to be useful.
            The log size was getting to 14mb within 45 seconds.

            Originally posted by vso
            It might have been caused by https://support.zabbix.com/browse/ZBX-14548
            I initially get prompted to login when clicking your link; but I Googled the bug number and was able to view it without logging in through Google (also strange?).
            This would suggest there should already be a fix out for this issue; no? I'm already currently running Zabbix 4.0.1 though this issue was present in just v4.0 as well.

            Comment

            Working...