Ad Widget

Collapse

Largest Zabbix Deployment?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • syndeysider
    Senior Member
    • Oct 2013
    • 115

    #1

    Largest Zabbix Deployment?

    Hello Ladies/Gentlemen

    Wondering who here has the largest deployment going on?

    I've just started in a new role which is looking at converging our current silo'ed monitoring systems into a single platform to provide a platform agnostic view of various systems.

    It's going to be a problem of scale given that the Network, Systems, Active Directory and Database volume is one of the largest in the Southern Hemisphere.

    Done some very very basic numbers and at an absolute minimum we looking at :

    Number of Hosts : 75k-130k+
    Number of proxies : 10-30+
    Number of triggers : 1.8Million+
    Number of items : 2-3Million+
    Number of users : 300-1000
    Zabbix DB : 2-6TB
    NVPS : between 25k-150k (depending on agreed check intervals)

    Anyone running something larger than this?
  • SBO
    Zabbix Certified Specialist
    Zabbix Certified Specialist
    • Sep 2015
    • 226

    #2
    Hi,

    For such a big infra, I honestly think you should contact Zabbix directly, it's really.. HUGE !

    Comment

    • syndeysider
      Senior Member
      • Oct 2013
      • 115

      #3
      I plan to, at this stage we are conceptualizing the design framework and comparing Zabbix to other products such as Icinga2.

      Comment

      • Alex.S
        Senior Member
        • Feb 2012
        • 258

        #4
        Hi syndeysider,

        There are larger installations out there in terms of hosts, items, triggers and proxies, but this is still huge

        The only thing I can not vouch for is the number of concurrent users. Not saying it's impossible to handle 1000 users, just haven't heard of anyone having so many at a time.

        Cheers,

        Alex.

        Comment

        • syndeysider
          Senior Member
          • Oct 2013
          • 115

          #5
          Cheers. This is something that we will probably ship off to Grafana.

          I'm hoping that the actual use cases for the front end remain with the SME's for each internal IT division that uses the platform.

          I'm progressing along nicely with the design concept and have a working prototype in place along with an Incinga 2 instance. It's been a very interesting comparison in my opinion. I've been a very big supported of Zabbix, right back to 2.0.1, but do see some really cool features in Icinga 2, like out the box support for writing to a big Time Series backend etc. Something I think is available but would have to be developed in the current version of Zabbix.

          Anyway, i digress, good to know there's bigger installs out there. I'm hoping that once an exec decision is made I can bring Zabbix SIA on board and get some real world design principles nailed down.

          Comment

          • onallion
            Senior Member
            • Mar 2016
            • 131

            #6
            Should be interesting. What are your current plans for DB? Percona XtraDB?

            Comment

            • jan.garaj
              Senior Member
              Zabbix Certified Specialist
              • Jan 2010
              • 506

              #7
              Originally posted by syndeysider
              I've been a very big supported of Zabbix, right back to 2.0.1, but do see some really cool features in Icinga 2, like out the box support for writing to a big Time Series backend etc.


              Implementation depends on the used "DB" (OpenTSDB, InfluxDB, Mongo, Graphite, Elasticsearch, DalmatinerDB, Scyladb, AWS S3, Bigtable, ....). It's can be 10-20 lines of the code usually - piece of cake. It's not a problem to send metric values to the external DB - problem is how to process/integrate metric metadata with that external DB.
              Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
              My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

              Comment

              • syndeysider
                Senior Member
                • Oct 2013
                • 115

                #8
                Originally posted by jan.garaj
                https://www.zabbix.com/documentation...port_callbacks

                Implementation depends on the used "DB" (OpenTSDB, InfluxDB, Mongo, Graphite, Elasticsearch, DalmatinerDB, Scyladb, AWS S3, Bigtable, ....). It's can be 10-20 lines of the code usually - piece of cake. It's not a problem to send metric values to the external DB - problem is how to process/integrate metric metadata with that external DB.
                Thanks! This is exactly what I plan on testing!! Really awesome to see this available now.

                Comment

                • syndeysider
                  Senior Member
                  • Oct 2013
                  • 115

                  #9
                  Originally posted by onallion
                  Should be interesting. What are your current plans for DB? Percona XtraDB?
                  Yes. With only History data of (x) days stored. Going to partition off the history_xxxx tables onto some PCIe SSD's which have been write optimized. In this space, I'm keeping a close eye on the new Intel Octane SSD's. This should bring my DB down to 1-2TB which i think is a bit more manageable both in terms of restore times and cost.

                  I'm looking at writing some very basic code to either pull (read off slave with modifications to https://github.com/zensqlmonitor/influxdb-zabbix) or push data (module) to Kafka or InfluxDB directly for the Trend/Historical keeping because of the sheer volume of data.

                  So Zabbix would process the basic triggers, alerts etc. and store minimal historical data with trend based analysis and service level triggers sitting on top of InfluxDB, Kapcitor, TICKscript etc. Stitch both together with Grafana and we might have something useful.

                  I've found these two VERY valuable for stress testing :


                  Zabbix Agent Simulator. Contribute to vulogov/zas_agent development by creating an account on GitHub.

                  Comment

                  • jan.garaj
                    Senior Member
                    Zabbix Certified Specialist
                    • Jan 2010
                    • 506

                    #10
                    https://github.com/zensqlmonitor/influxdb-zabbix - that's not good idea - you can't scale it - single point of failure - module option is better solution.

                    InfluxDB: 150k nvps + additional nvps from other service checks - single node can handle 480k nvps (https://blog.outlyer.com/time-series...ase-benchmarks). But to be safe, you need a cluster. FYI InfluxDB had a problem to read a lot of data in 2015 (https://cds.cern.ch/record/2011172/f...K-2015-060.pdf), maybe it's better now.
                    Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                    My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                    Comment

                    • jan.garaj
                      Senior Member
                      Zabbix Certified Specialist
                      • Jan 2010
                      • 506

                      #11
                      I think Kafka will be better design decision. You can stream data from the Kafka (even multiple times) to the selected DB(s) (InfluxDB, HBase, OpenTSDB, ...) - search "GrafanaCon 2016: Utkarsh Bhatnagar, Elastic-Monitoring Using Grafana at Sony PlayStation" on youtube.

                      More valuable resources for Zabbix stress testing are https://monitoringartist.github.io/z...archer/#stress
                      Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                      My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                      Comment

                      • syndeysider
                        Senior Member
                        • Oct 2013
                        • 115

                        #12
                        Originally posted by jan.garaj
                        I think Kafka will be better design decision. You can stream data from the Kafka (even multiple times) to the selected DB(s) (InfluxDB, HBase, OpenTSDB, ...) - search "GrafanaCon 2016: Utkarsh Bhatnagar, Elastic-Monitoring Using Grafana at Sony PlayStation" on youtube.

                        More valuable resources for Zabbix stress testing are https://monitoringartist.github.io/z...archer/#stress
                        Cheers! Very interesting Talk by Utkarsh Bhatnagar.

                        Busy looking at https://github.com/edenhill/librdkafka and building a prototype. It's an extra layer but you are correct in that it would allow some form of flexibility in the underlying Time Series choices.

                        Comment

                        • jan.garaj
                          Senior Member
                          Zabbix Certified Specialist
                          • Jan 2010
                          • 506

                          #13
                          Could you publish your prototype under open source license on the GitHub/GitLab please?
                          Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                          My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                          Comment

                          • syndeysider
                            Senior Member
                            • Oct 2013
                            • 115

                            #14
                            Originally posted by jan.garaj
                            Could you publish your prototype under open source license on the GitHub/GitLab please?
                            Quite possible depending on our policy on publishing code externally. I would like to, but I am new here and will find out!

                            On another note

                            https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2


                            Looks very interesting!

                            Comment

                            • bbrendon
                              Senior Member
                              • Sep 2005
                              • 870

                              #15
                              As for scaling I'd say Zabbix suffers from age. Back when Zabbix was born all these new technologies were still many years away. Zabbix hasn't improved since it's inception in terms of architecture and back-end.

                              I get the feeling there should be something better than Zabbix but I'm not sure what that is. Maybe there isn't.

                              On the plus side, Zabbix has matured continuously over the years and is still going strong.
                              Unofficial Zabbix Expert
                              Blog, Corporate Site

                              Comment

                              Working...