Ad Widget

Collapse

Adding VCenter server to Zabbix causes performance problems

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • TrevorD
    Junior Member
    • Jun 2018
    • 15

    #1

    Adding VCenter server to Zabbix causes performance problems

    Hi all,

    I have found that when I add a VMware VCenter server to our Zabbix 3.4.11 Appliance, Zabbix server stops at about 18:30 each night. We only have a relatively small virtual environment to monitor and only want to discover hosts and datastores and NOT VM's. We have 11 ESXi hosts, 2 Clusters and 28 Datastores.

    I have investigated the issue and have found the following:
    • Some VMware stats ie CPU usage and Memory usage do not update.
    • When I go to Administration -> queue, there are a number of items in the queue.
    • I have a Zabbix dashboard in Grafana and it says that "Zabbix busy preprocessing manager process" is at 100 % at various times.
    • I have checked zabbix log and see the following

    838:20180626:122749.395 syncing history data...
    838:20180626:122749.408 syncing history data done
    838:20180626:122749.408 syncing trend data...
    838:20180626:122751.309 syncing trend data done
    838:20180626:122751.391 Zabbix Server stopped. Zabbix 3.4.7 (revision 77720).
    16093:20180626:122752.443 Starting Zabbix Server. Zabbix 3.4.7 (revision 77720).
    • I have updated the zabbix_server.conf as per below:


    ############ ADVANCED PARAMETERS ################

    ### Option: StartPollers
    # Number of pre-forked instances of pollers.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    StartPollers=50

    ### Option: StartIPMIPollers
    # Number of pre-forked instances of IPMI pollers.
    # The IPMI manager process is automatically started when at least one IPMI poller is started.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    # StartIPMIPollers=0

    ### Option: StartPreprocessors
    # Number of pre-forked instances of preprocessing workers.
    # The preprocessing manager process is automatically started when preprocessor worker is started.
    #
    # Mandatory: no
    # Range: 1-1000
    # Default:
    StartPreprocessors=50

    ### Option: StartPollersUnreachable
    # Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java).
    # At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers
    # are started.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    # StartPollersUnreachable=1

    ### Option: StartTrappers
    # Number of pre-forked instances of trappers.
    # Trappers accept incoming connections from Zabbix sender, active agents and active proxies.
    # At least one trapper process must be running to display server availability and view queue
    # in the frontend.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    # StartTrappers=5

    ### Option: StartPingers
    # Number of pre-forked instances of ICMP pingers.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    # StartPingers=1

    ### Option: StartDiscoverers
    # Number of pre-forked instances of discoverers.
    #
    # Mandatory: no
    # Range: 0-250
    # Default:
    # StartDiscoverers=1

    ### Option: StartHTTPPollers
    # Number of pre-forked instances of HTTP pollers.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    # StartHTTPPollers=1

    ### Option: StartTimers
    # Number of pre-forked instances of timers.
    # Timers process time-based trigger functions and maintenance periods.
    # Only the first timer process handles the maintenance periods.
    #
    # Mandatory: no
    # Range: 1-1000
    # Default:
    # StartTimers=1

    ### Option: StartEscalators
    # Number of pre-forked instances of escalators.
    #
    # Mandatory: no
    # Range: 0-100
    # Default:
    # StartEscalators=1

    ### Option: StartAlerters
    # Number of pre-forked instances of alerters.
    # Alerters send the notifications created by action operations.
    #
    # Mandatory: no
    # Range: 0-100
    # Default:
    # StartAlerters=3

    ### Option: JavaGateway
    # IP address (or hostname) of Zabbix Java gateway.
    # Only required if Java pollers are started.
    #
    # Mandatory: no
    # Default:
    # JavaGateway=
    JavaGateway=127.0.0.1

    ### Option: JavaGatewayPort
    # Port that Zabbix Java gateway listens on.
    #
    # Mandatory: no
    # Range: 1024-32767
    # Default:
    # JavaGatewayPort=10052

    ### Option: StartJavaPollers
    # Number of pre-forked instances of Java pollers.
    #
    # Mandatory: no
    # Range: 0-1000
    # Default:
    # StartJavaPollers=0
    StartJavaPollers=5

    ### Option: StartVMwareCollectors
    # Number of pre-forked vmware collector instances.
    #
    # Mandatory: no
    # Range: 0-250
    # Default:
    StartVMwareCollectors=100

    ### Option: VMwareFrequency
    # How often Zabbix will connect to VMware service to obtain a new data.
    #
    # Mandatory: no
    # Range: 10-86400
    # Default:
    VMwareFrequency=60

    ### Option: VMwarePerfFrequency
    # How often Zabbix will connect to VMware service to obtain performance data.
    #
    # Mandatory: no
    # Range: 10-86400
    # Default:
    VMwarePerfFrequency=60

    ### Option: VMwareCacheSize
    # Size of VMware cache, in bytes.
    # Shared memory size for storing VMware data.
    # Only used if VMware collectors are started.
    #
    # Mandatory: no
    # Range: 256K-2G
    # Default:
    VMwareCacheSize=2048M

    ### Option: VMwareTimeout
    # Specifies how many seconds vmware collector waits for response from VMware service.
    #
    # Mandatory: no
    # Range: 1-300
    # Default:
    # VMwareTimeout=10


    One of the issues is that I am not able to find any documentation that details what advanced parameters are optimal for the amount of hosts / nodes that are being monitored. Is it just a matter of trial and error to find what works best ?

    Has anyone else had an issue where adding a VCenter server to Zabbix causes it to Zabbix server to stop at a particular time ?

    We have since upgraded the appliance from 3.4.7 to 3.4.11 and the issue still occurs. I have removed the VCenter server from Zabbix and Zabbix is now stable and has been up for more than 24 hours. I am sure that it is a case of optimising the zabbix_server.conf but not sure of the optimised value.

    Any info would be great as we would like to monitor VCenter and hosts with Zabbix.

    Thanks

    Trev
  • kloczek
    Senior Member
    • Jun 2006
    • 1771

    #2
    Zabbix processes did not restarts themselves.
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment

    • TrevorD
      Junior Member
      • Jun 2018
      • 15

      #3
      What are you referring to kloczek ? Are you referring to the events in the log file ? below is another example of Zabbix server stopping and I and I had to restart zabbix server the following morning.



      21528:20180627:001739.285 [file:vmware.c,line:86] zbx_mem_malloc(): out of memory (requested 156 bytes)
      21528:20180627:001739.285 [file:vmware.c,line:86] zbx_mem_malloc(): please increase VMwareCacheSize configuration parameter
      21488:20180627:001739.887 One child process died (PID:21528,exitcode/signal:1). Exiting ...
      21488:20180627:001741.952 syncing history data...
      21488:20180627:001741.959 item "4c4c4544-0035-4810-8036-c6c04f543932:vmware.hv.status[{$URL},{HOST.HOST}]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.959 item "4c4c4544-0043-5010-8037-b6c04f523432:vmware.hv.datastore.size[{$URL},{HOST.HOST},T2_VMDatastore7_R]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.960 item "4c4c4544-0052-4b10-8044-b8c04f523432:vmware.hv.datastore.size[{$URL},{HOST.HOST},T2_VMDatastore14_R,pfree]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.960 item "4c4c4544-0035-4b10-8034-c6c04f543932:vmware.hv.datastore.read[{$URL},{HOST.HOST},T1_VMDatastore9_NR,latency]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.960 item "4c4c4544-0035-4410-8035-c6c04f543932:vmware.hv.datastore.read[{$URL},{HOST.HOST},T1_VMDatastore1_R,latency]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.960 item "4c4c4544-0035-4410-8035-c6c04f543932:vmware.hv.datastore.write[{$URL},{HOST.HOST},T1_VMDatastore11_NR,latency]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.960 item "4c4c4544-0035-4810-8036-c6c04f543932:vmware.hv.datastore.size[{$URL},{HOST.HOST},T2_VMDatastore13_NR]" became not supported: Unknown hypervisor uuid.
      21488:20180627:001741.969 syncing history data done
      21488:20180627:001741.969 syncing trend data...
      21488:20180627:001743.341 syncing trend data done
      21488:20180627:001743.342 Zabbix Server stopped. Zabbix 3.4.7 (revision 77720).
      835:20180627:082300.482 Starting Zabbix Server. Zabbix 3.4.7 (revision 77720).
      835:20180627:082300.488 ****** Enabled features ******
      835:20180627:082300.488 SNMP monitoring: YES
      835:20180627:082300.488 IPMI monitoring: YES
      835:20180627:082300.488 Web monitoring: YES
      835:20180627:082300.488 VMware monitoring: YES

      Below are the zabbix stats before adding vcenter.

      Click image for larger version

Name:	zabbix1.JPG
Views:	2887
Size:	45.9 KB
ID:	361661

      Stats after adding vcenter


      Click image for larger version

Name:	zabbix2.JPG
Views:	2782
Size:	40.3 KB
ID:	361662

      Are there settings I should optimize in the config file for the above NVP and number of items ?

      Thanks

      Trev

      Comment

      • Atsushi
        Senior Member
        • Aug 2013
        • 2028

        #4
        The log is output as follows, but has the VMwareCacheSize setting been adjusted?

        Code:
        21528:20180627:001739.285 [file:vmware.c,line:86] zbx_mem_malloc(): out of memory (requested 156 bytes)
        21528:20180627:001739.285 [file:vmware.c,line:86] zbx_mem_malloc(): please increase VMwareCacheSize configuration parameter

        Comment

        • TrevorD
          Junior Member
          • Jun 2018
          • 15

          #5
          Hey Atsushi,

          Yes this config has been updated as per below:

          ### Option: VMwareCacheSize
          # Size of VMware cache, in bytes.
          # Shared memory size for storing VMware data.
          # Only used if VMware collectors are started.
          #
          # Mandatory: no
          # Range: 256K-2G
          # Default:
          VMwareCacheSize=2048M

          Comment


          • Atsushi
            Atsushi commented
            Editing a comment
            Is there an out of memory error even after that change?
            If the same error continues to occur, is not it necessary to readjust?
            Please check the log file again.
        • TrevorD
          Junior Member
          • Jun 2018
          • 15

          #6
          I have just added VCenter server back into Zabbix this morning so I will keep my eye on it to see if the issue occurs again. Zabbix server seems to stop at around 17:00 - 18:00 and I have to restart the following morning.

          CPU and memory stats are still not updating, I had some probs with the changes I made in the config file this morning so server was offline as you can see, then Zabbix appeared to update stats for about 10 minutes then nothing after that,

          Click image for larger version

Name:	zabbix3.JPG
Views:	2746
Size:	104.4 KB
ID:	361679

          Comment

          • TrevorD
            Junior Member
            • Jun 2018
            • 15

            #7
            So I monitored the performance of Zabbix last night and this morning and it looks like that any performance related issues co-insides with our VMware backups. I ended up disabling VMware Event log checks in the standard VMware template. Looks like Zabbix server has held up and did not stop overnight.


            Looks like CPU and memory stats are now updating as required. Might have been the simple fact of re-adding the VCenter server back to Zabbix after upgrade to 3.4.11 or the disabling of the VCenter log check

            . Click image for larger version

Name:	zabbix4.JPG
Views:	2861
Size:	69.8 KB
ID:	361774

            Preprocessing manager process was still maxing out for a period of time and not really sure how to resolve this,

            Click image for larger version

Name:	Zabbix5.JPG
Views:	2772
Size:	26.9 KB
ID:	361775


            At the end of the day we are not using Zabbix as our primary monitoring solution and we are only using the appliance to monitor a subset of infrastructure. We will complete a full Zabbix build on CentOS or the like when we look at migrating completely to Zabbix and try and tune it appropriately.

            Trev

            Comment

            • Pada
              Senior Member
              • Apr 2012
              • 236

              #8
              I now had the same issue with Zabbix 4.2 the moment I started monitoring vCenter (via a Zabbix 4.2 Proxy running in Docker).

              I tried to resolve the crashing & performance issues by:
              1. gradually increasing the VMWARECACHESIZE on the Proxy (by restarting docker container) till it stopped crashing with the memory error when I had it set to 384M, however then I started getting Proxy history cache issues.
              2. then gradually increased HISTORYCACHESIZE on the Proxy to like 256M, but then my Proxy MySQL server got stuck (high CPU usage) on a commit statement and my Proxy got completely unresponsive in terms of gathering/forwarding metrics
              3. increasing my Proxy MySQL resources (RAM, CPU), however this ended up just overloading my Zabbix Server, where it now started having History cache issues and my Server MySQL had very high CPU utilization suddenly too.


              The culprit ended up being the "vmware.eventlog[{$URL}]" inside the "Template Virt VMware" template!
              After I disabled it, my Server MySQL (Aurora MySQL db.t2.medium) CPU usage dropped from 40% down to 10% and my Zabbix nvps (new values per second) dropped from a peak of 2.2k down to 80nvps.

              When I manually ran a count query for the amount of log entries that vmware.eventlog generated, it was 750,000 entries within like 8 hours.

              Comment

              • dougbee
                Member
                • Apr 2011
                • 68

                #9
                Originally posted by Pada
                I now had the same issue with Zabbix 4.2 the moment I started monitoring vCenter (via a Zabbix 4.2 Proxy running in Docker).

                I tried to resolve the crashing & performance issues by:
                1. gradually increasing the VMWARECACHESIZE on the Proxy (by restarting docker container) till it stopped crashing with the memory error when I had it set to 384M, however then I started getting Proxy history cache issues.
                2. then gradually increased HISTORYCACHESIZE on the Proxy to like 256M, but then my Proxy MySQL server got stuck (high CPU usage) on a commit statement and my Proxy got completely unresponsive in terms of gathering/forwarding metrics
                3. increasing my Proxy MySQL resources (RAM, CPU), however this ended up just overloading my Zabbix Server, where it now started having History cache issues and my Server MySQL had very high CPU utilization suddenly too.


                The culprit ended up being the "vmware.eventlog[{$URL}]" inside the "Template Virt VMware" template!
                After I disabled it, my Server MySQL (Aurora MySQL db.t2.medium) CPU usage dropped from 40% down to 10% and my Zabbix nvps (new values per second) dropped from a peak of 2.2k down to 80nvps.

                When I manually ran a count query for the amount of log entries that vmware.eventlog generated, it was 750,000 entries within like 8 hours.
                Thanks for this! I was having the same issue - even with testing on a small VMware cluster, my History Cache usage would rocket up. Trying to stop zabbix-server would result in a slow history sync (normally it shuts down within a few seconds) and eventually failing.

                Disabling the eventlog item fixed it for me as well!

                Comment

                Working...