Ad Widget

Collapse

Any plans to adopt Cassandra as a backend?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • elvar
    Senior Member
    • Feb 2008
    • 226

    #1

    Any plans to adopt Cassandra as a backend?

    I've read about a number of large sites moving to Cassandra for their backends lately and I was wondering if there are any plans to implement Cassandra as a backend choice for Zabbix.



    Regards,
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Are you referring to storage of historical data only or everything?
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • exkg
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified Specialist
      • Mar 2007
      • 718

      #3
      Talking about fault tolerance I think everything

      Some numbers about performance are impressive ...

      Cassandra vs MySQL with 50GB of data:
      MySQL Cassandra
      ~300ms write ~0.12ms write
      ~350ms read ~15ms read


      []s,
      Luciano
      --
      Luciano Alves
      www.zabbix.com
      Brazil | México | Argentina | Colômbia | Chile
      Zabbix Performance Tuning

      Comment

      • harpo
        Junior Member
        • Mar 2010
        • 2

        #4
        It seems like the *configuration* data -- hosts, templates, etc. -- probably is relational and might like to live in a relational database.

        But the collected statistics data seems like an obvious fit for Cassandra.

        Comment

        • zabbix_zen
          Senior Member
          • Jul 2009
          • 426

          #5
          Those DB implementations are popping like rabbits..

          RDBMSs will not die per se.. Hybrids will step forward.

          What about MySQL NDB (distributed object DB as backend to MySQL)
          or HadoopDB (PostgreSQL behind the Hadoop MapReduce) ?

          Interesting discussion for Zabbix 2.0....
          Last edited by zabbix_zen; 23-03-2010, 16:31.

          Comment

          • fpaternot
            Member
            Zabbix Certified Specialist
            • Feb 2013
            • 52

            #6
            It is implemented but i dont see any use cases, how to setup.. i'll look into it.


            Join the friendly and open Zabbix community on our forums and social media platforms.

            Comment

            • BDiE8VNy
              Senior Member
              • Apr 2010
              • 680

              #7
              According to comments made in Optimal DB Engine(s) for Zabbix it's expected to get Cassandra supported by a loadable module for alternative storage of historical data.
              See: ZBXNEXT-1836 or rather ZBXNEXT-714

              Although, even if it sounds promising to come soon it's not necessarily likely to happen as the next stage.
              Personally I hope to see it implemented/released in 3.0.... but such kind of rumors exist for communication encryption for quite a long time as well.

              Lets see

              Comment

              • Colttt
                Senior Member
                Zabbix Certified Specialist
                • Mar 2009
                • 878

                #8
                PostgreSQL has a type hstore, its noSQL like..



                or in PostgreSQL you can use JSON oder JSONB
                Debian-User

                Sorry for my bad english

                Comment

                • kloczek
                  Senior Member
                  • Jun 2006
                  • 1771

                  #9
                  Originally posted by exkg
                  Cassandra vs MySQL with 50GB of data:
                  MySQL Cassandra
                  ~300ms write ~0.12ms write
                  ~350ms read ~15ms read
                  Cassandra uses shards. if you want to produce similar stats on MySQL you must start using partitioned tables.

                  Above stats have been made on testing exactly what kind of workload?
                  Last edited by kloczek; 21-01-2015, 17:36.
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment

                  • cavaliercoder
                    Junior Member
                    • Apr 2014
                    • 13

                    #10
                    I feel a Time Series Database such as OpenTSDB or InfluxDB would be better suited to storing Historical and Trending data.

                    Comment

                    • kloczek
                      Senior Member
                      • Jun 2006
                      • 1771

                      #11
                      Originally posted by cavaliercoder
                      I feel a Time Series Database such as OpenTSDB or InfluxDB would be better suited to storing Historical and Trending data.
                      Just try to test it and bring some numbers about how better it may be.

                      Just remember that on access to historic data main bottleneck is in not in exact type of DB engine but in access to data with high probablity that they are only on physical storage.

                      Example: after switching mysql database to ZFS on Solaris from Linux I've observed only on writing current data drop down number of IOs from 1.3-1.7 kIO/s to about 130-300 IO/s. How it is possible?
                      ZFS is using COW semantics and by definition ZFS is able convert random IOs updates to combine them into sequential write operations concatenating many VFS operations to much lower number of physical IOs.
                      Additionally with new ZFS ARC (Adaptive Reclaim Cache) from latest SRU keeping with compressed buffered records in memory is possible decrease significantly pressure in read IOs. How? The same size of ARC memory is now able to hold much more data in the same amount of RAM as compressed content. For example ATM with lzjb I have 2.67 compression ration on mysql data used by zabbix. With typical warehouse DB like it is is with zabbix DB was possible to switch to biggest ZFS record (1MB) to have additionally maxed compression ratio.

                      On my mysql DB backend I have 16GB ARC (host has 32GB; mysql innodb pool has only 7GB on this host).
                      My zabbix is writing daily about 12GB of new data (I'm using partitioned history* and trends* tables so it is easy to count this). Effectively most of the data used to observe all graphs in 1d scale are are served without touching storage.
                      Obtaining any data necessary to draw graphs in wider scale does not take longer than 2s.

                      If you would be able to present that on top of you DB engine you are able to produce better factors than on mysql+zfs I will agree with you that switching to new DB engine make sense

                      Really sometimes it is more important to be aware what is possible to do with existing (well tested) technologies instead telling "we need new DB engine backend because zabbix is slow"

                      Proper understanding of what is going on in your application on different layers is key thing to build fastest setup.
                      So on propose using for example Cassandra you must explain how it may be possible. Explanation that some queries may be parallelized on execute them on more than one box is not enough if you may find that MFU/MRU data you are able to hold in memory of single host. In such case splitting queries to perform them on multiple hosts with high probability will slow down everything.

                      If someone have few TB zabbix database IMO it is way better is consider to switch to ZFS with few hundredths of GB of MRU/MFU data and use one such size SSD as L2ARC device (Level 2 ARC) in SSD and use even slowest possible spindles. ZFS will hold automatically MRU/MFU data in L2ARC without tuning it ..
                      Cost of the storage in such case will be cost of the spindles with almost speed of SSD.
                      In case using Linux still there is no so effective buffering mechanism like L2ARC and by this many people must use SSD only storage. Applying COW semantics in case some Linux FSes changes it a bit. However none of Linux FSes is using free list like ZFS (even btrfs is using allocation structures).

                      In such cases Linux (which is for free) is more expensive that Solaris (which usually you must pay for support) because 2-3k£/year cost (for typical 1CPU socket in 1/2U pizza box) will be below cost of SSD only storage.

                      On using ZFS is good to remember that this technology is not new and matured during his more than 10 years history. ZFS constantly improves in a way which was not possible to observe in case of any Linux originated storage technology.
                      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                      https://kloczek.wordpress.com/
                      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                      My zabbix templates https://github.com/kloczek/zabbix-templates

                      Comment

                      • fpaternot
                        Member
                        Zabbix Certified Specialist
                        • Feb 2013
                        • 52

                        #12
                        Originally posted by kloczek
                        Example: after switching mysql database to ZFS on Solaris from Linux I've observed only on writing current data drop down number of IOs from 1.3-1.7 kIO/s to about 130-300 IO/s.
                        Thats a very interesting argument. I'll try that on FreeBSD 10 and report back. No Sun servers available anymore here.

                        Can you provide some info on your environment? NVPS, users connected to the frontend, etc?

                        Comment

                        • kloczek
                          Senior Member
                          • Jun 2006
                          • 1771

                          #13
                          Originally posted by fpaternot
                          Thats a very interesting argument. I'll try that on FreeBSD 10 and report back. No Sun servers available anymore here.

                          Can you provide some info on your environment? NVPS, users connected to the frontend, etc?
                          It is kind of popular gossip that to run Solaris you must use Sun/Oracle hardware. It is not true and most of the HP, Dell and IBM x86 hardware are on official Solaris HCL
                          http://www.oracle.com/webfolder/technetwork/hcl/

                          Environment:
                          2 x frontend hosts dedicated to running zabbix server, main proxy (monitoring almost half of the items) with own small mysql DB backend and web frontend.
                          OS: Oracle Linux 6 with UEKR3 kernels (with DTrace support).
                          Hardware: 2 x HP blade gen 6 with 2x146GB 10krpm mirrored disks. 32GB RAM

                          Each frontend host has own node IP. On top of this are IPS for: web frontend, zabbix main proxy, zabbix proxy DB backend.
                          Typically first node is running zabbix server and web frontend and on second node is zabbix proxy.
                          All zabbix proxy have ProxyOfflineBuffer=6

                          2 x DB backend.
                          Hardware: the same HP blade gen 6 (lowest supported version HP blades in Solaris 11.2 HCL)
                          Local storage: mirrored (by build-in controller) 2 x 200GB SSD
                          On first DB backend is running master DB bind to own DB backend IP. Second host is used as slave.

                          Generally using per service IP allows juggle locations of exact services. manual failover any of the service is below 2-3s and in such architecture I can schedule restart of any host if it is needed.

                          Users usually connected: 20-40.

                          Number of hosts (monitored/not monitored/templates) 1522 966 / 55 / 501
                          Required server performance, new values per second 1674.94

                          Other details about this env you can find on https://www.zabbix.com/forum/showthread.php?t=48233

                          PS. ~80% of our templates are host templates which are assembled only to have some dedicated macro values per group of hosts. It would be possible to decrease number of used templates if in autoregistration actions will be possible to inject macro with exact value.
                          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                          https://kloczek.wordpress.com/
                          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                          My zabbix templates https://github.com/kloczek/zabbix-templates

                          Comment

                          • fpaternot
                            Member
                            Zabbix Certified Specialist
                            • Feb 2013
                            • 52

                            #14
                            Originally posted by kloczek
                            It is kind of popular gossip that to run Solaris you must use Sun/Oracle hardware. It is not true and most of the HP, Dell and IBM x86 hardware are on official Solaris HCL
                            http://www.oracle.com/webfolder/technetwork/hcl/
                            Thanks for the info! I'll weight between FreeBSD and Solaris and make the tests. Will also try to simulate users accessing the frontend, to make a bigger impact and a bit more realistic.

                            Maybe you could to create a wiki in zabbix.org?

                            Comment

                            • kloczek
                              Senior Member
                              • Jun 2006
                              • 1771

                              #15
                              Originally posted by fpaternot
                              Thanks for the info! I'll weight between FreeBSD and Solaris and make the tests. Will also try to simulate users accessing the frontend, to make a bigger impact and a bit more realistic.

                              Maybe you could to create a wiki in zabbix.org?
                              Be aware that it is possible that you will be not able to reach some performance factors possible to reach on Solaris. Why? FreeBSD is using OpenZFS implementation which is atm way behind what is available in Sol 11.2:
                              • OpenZFS maximum recordsize is 128KB. Almost three years ago Oracle added modification to rise this limit to 1MB. With bigger recordsize is possible to gain higher compression ratio because everything is compressed in bigger chunks. Max recordsize is IMO perfect for something like warehouse database.


                              If you will have look one more time what wrote about my installation you can easy find that it is very possible that whatever will be needed by web frontend will be well cached in ZFS ARC, mysql innodb pool or zabbix internal caches.
                              From this point of view impact of higher number of web frontend clients will only affect apache activity which still can be improved by add zend optimizer module (I'm going to test this module in next week or two). If it will be still some performance issue it is possible to scale horizontally web frontend by use DSR load balancer .. which in my case can be organized on top of pair of Solarises on DB backend nodes using Solaris ILB http://docs.oracle.com/cd/E23824_01/...453/gijjm.html
                              Last edited by kloczek; 11-02-2015, 22:24.
                              http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                              https://kloczek.wordpress.com/
                              zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                              My zabbix templates https://github.com/kloczek/zabbix-templates

                              Comment

                              Working...