Ad Widget

Collapse

Improved VMWare Monitoring with VmBix

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tatapoum
    Senior Member
    • Jan 2014
    • 185

    #1

    Improved VMWare Monitoring with VmBix

    Hi all,
    VmBix is a multi-threaded TCP proxy for the VMWare Sphere API written in Java. It accepts connections from a Zabbix server/proxy/agent or the zabbix_get binary and translates them to VMWare API calls.

    Starting from version 2.2, Zabbix can natively monitor a VMWare environment. But there are a few drawbacks :
    - The monitored items are not all very relevant
    - This is not easily extensible
    - The created ESX and VM hosts are mostly read-only. You cannot attach them different templates, put them into different groups, or use a Zabbix agent to monitor their OS or apps

    VmBix helps you to overcome these limitations, with very good performance. It is multi-threaded, implements objects caching, and can be queried using a Zabbix loadable module.

    VmBix comes with a set of templates adding several monitored items, triggers and graphs in Zabbix. A sample import script is also provided. It automatically creates regular Zabbix hosts for hypervisors, VMs & datastores, allowing them to be monitored with a Zabbix agent in parallel, or the use different templates/groups for different hosts. Here are a few screenshots of what you can expect in Zabbix :





    You can use VmBix methods to query interesting VMWare metrics, for example :
    Code:
    esx.counter[esx01.domain.local,cpu.ready.summation]
    1135
    Code:
    vm.counter.discovery[VM01,virtualDisk.totalReadLatency.average]
    {
        "data": [
            {
                "{#METRICINSTANCE}": "scsi2:2"
            },
            {
                "{#METRICINSTANCE}": "scsi2:1"
            },
            {
                "{#METRICINSTANCE}": "scsi2:0"
            },
            {
                "{#METRICINSTANCE}": "scsi2:6"
            },
            {
                "{#METRICINSTANCE}": "scsi2:5"
            },
            {
                "{#METRICINSTANCE}": "scsi2:4"
            },
            {
                "{#METRICINSTANCE}": "scsi2:3"
            }
        ]
    }
    Code:
    vm.counter[VM01,virtualDisk.totalReadLatency.average,scsi2:4,300]
    2
    The version 2.5 was released. Here is the CHANGELOG :
    - ***BREAKING CHANGE*** : all parameters in the configuration file vmbix.conf and on the command line must now be in lowercase. The possible arguments in command line have also changed. See the "usage" output.
    - Improved the script vmbix-object-sync
    - Hacked the vm.guest.disk.* methods with a workaround for ZBX-10590. If a disk name ends with \, a space will be added at the end of the disk name. This is controlled by the parameter escapechars in the configuration file. It is set to false by default.
    - Fixed the ESX usage item in the template.
    - Code cleanup and better error handling
    - Added the vm.stats methods :
    vm.stats[threads] indicates the number of working threads
    vm.stats[queue] indicates the size of the connection queue
    vm.stats[requests] indicates the number of requests received by VmBix
    vmbix.stats[cachesize,(vm|esxi|ds|perf|counter|hri|cluster)] indicates the size of each cache
    vmbix.stats[hitrate,(vm|esxi|ds|perf|counter|hri|cluster)] indicates the hit rate of each cache (1.0 = 100% hits)
    - Exposed the following parameters in the configuration
    connecttimeout : the VmWare API connect timeout
    readtimeout : the VmWare API read timeout
    maxconnections : the maximum number of concurrent connections accepted by Vmbix

    Here are the download links for the VmBix packages and its Zabbix loadable module.

    There is now a Wiki reference page with all the supported methods.

    All the details here :
    Last edited by tatapoum; 12-09-2016, 12:57.
  • bbrendon
    Senior Member
    • Sep 2005
    • 870

    #2
    Needs snapshot monitoring. For me that's the #1 ESXi thing to monitor.
    Unofficial Zabbix Expert
    Blog, Corporate Site

    Comment

    • tatapoum
      Senior Member
      • Jan 2014
      • 185

      #3
      What kind of information do you need ? Check if a VM has a snapshot ?

      Comment

      • bbrendon
        Senior Member
        • Sep 2005
        • 870

        #4
        Yea. At the minimum count how many snapshots there are per VM.

        I'm not sure what the capabilities are with your software/API but the biggest issue I've had with ESX that causes massive downtime problems are old snapshots being forgotten on production servers. Not the size, not really the quantity, but the age.

        If there is a snapshot that is a day old. Fine. If there is a snapshot that is a week old on a production system, that's a problem. So at the minimum if I could trigger on snapshot qty > 1 for production machines (don't care about lab), then send a warning alert.
        Unofficial Zabbix Expert
        Blog, Corporate Site

        Comment

        • tatapoum
          Senior Member
          • Jan 2014
          • 185

          #5
          I released the version 2.4, adding a vm.snapshot method which returns 1 if there is a snapshot.
          You should be able to use the change(1d) trigger function to check if there has been a snapshot for more that one day.

          Comment

          • tatapoum
            Senior Member
            • Jan 2014
            • 185

            #6
            There is now a Wiki reference page with all the supported methods (https://github.com/dav3860/vmbix/wiki).

            Comment

            • jackie
              Member
              • Jan 2016
              • 37

              #7
              Alerting on snapshots will be helpful! Kudos!

              We did have a nasty situation happen earlier this week, in which VmBix was involved, and I'm hoping you can clarify what may have happened.

              The PSCs were rebooted, then the vcenters, then the webclient. Unfortunately this made the account used to access the SDK on vcenter unusable. VmBix couldn't connect and we saw lots of these types of errors on the vmbix log:
              Code:
              2016-07-19 22:09:15,129 ERROR [Thread-1826805] [VmBix.java:446] Connection update error: com.vmware.vim25.InvalidLogin
              2016-07-19 22:09:15,133 WARN  [Thread-1826805] [VmBix.java:2109] No vm named 'CSUWindows_EX10CAHT2' found
              We are monitoring well over 500 VMs on VMware, and this issue caused interference with collecting data from additional devices that are monitored by this proxy. Zabbix was reporting lots of non-VMware-related devices as down that are monitored by the proxy. Actually it is more accurate to say that the status of the other devices were flapping, as zabbix would report them as up and then down repeatedly.

              The zabbix proxy did not seem to log any issues. However, some type of timeout must have been happening, as it wasn't completing the queries for the data items from the other non-VMware devices.

              I don't understand much about the internals of zabbix data collection. But I was wondering if you could perhaps clarify how the VMBix connection issues were interfering with the proxy data collection?

              Thanks for any info and a great piece of software,
              Jackie

              Comment

              • tatapoum
                Senior Member
                • Jan 2014
                • 185

                #8
                I think that as VmBix wasn't able to communicate with the vCenter, the requests were timing out in Zabbix (depending on the Timeout parameter in the configuration file). This probably caused all the Zabbix poller processes to wait for a answer from VmBix and it impacted the other checks.

                You could try to decrease the Timeout parameter in Zabbix (as long as it's enough for all the VmBix checks to succeed, including the discoveries). Or increase the StartPollers parameter.

                I've already seen this behavior when the vCenter doesn't respond for a long time and the Zabbix queue fills up until it cannot handle the load. I may try to implement some kind of "dead timeout" in VmBix if the vCenter isn't responding anymore.

                Comment

                • jackie
                  Member
                  • Jan 2016
                  • 37

                  #9
                  I did have the timeout configured to a high value, so that very well could have contributed to the issues. A "dead timeout" sounds like a great idea. Anything we can do to make monitoring more stable.
                  Thanks for the info.

                  Comment

                  • stav13
                    Member
                    • Oct 2013
                    • 66

                    #10
                    Hi,

                    I would like to update from version 2.0 to 2.4 for the snapshot functionality however I'm a bit confused on how to do this as I had to installed from source due to wanting to monitor two vcentre's.

                    Could you please detail how I would run the upgrade please?

                    Thanks

                    Comment

                    • jackie
                      Member
                      • Jan 2016
                      • 37

                      #11
                      I'm also interested in the upgrade process. We are in the same position: installed from source and two vcenters. Thanks.

                      Comment

                      • michael.weber
                        Senior Member
                        • Nov 2015
                        • 121

                        #12
                        Hello,
                        any Information regarding the Performance?
                        1Proxy is mointoring ~100VMs on 1 vCenter.
                        Using ~ 10 Proxys.

                        Best Regards
                        Michael

                        Comment

                        • tatapoum
                          Senior Member
                          • Jan 2014
                          • 185

                          #13
                          For the moment, if you want to use multiple vcenters with VmBix, the recommendation is to have one proxy per vcenter. Have a look at the changelog because there were breaking changes between 2.0 and 2.4.

                          Regarding the performance, we monitor 350 VMs with one Zabbix proxy (4vCPU 2.70GHz, 8GB RAM) and the loadable module for v3.0. The CPU usage doesn't exceed 20-25% and VmBix has excellent response times (most of the time < 10ms).

                          Comment

                          • michael.weber
                            Senior Member
                            • Nov 2015
                            • 121

                            #14
                            thanks for the Information, i will implement the VMBIX parallel with vPoller and check if meets my requirements!

                            Comment

                            • tatapoum
                              Senior Member
                              • Jan 2014
                              • 185

                              #15
                              I'd be interested in the results of your benchmark of VmBix and vPoller.

                              Comment

                              Working...