Ad Widget

Collapse

Zabbix und NoSQL

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Colttt
    Senior Member
    Zabbix Certified Specialist
    • Mar 2009
    • 878

    #1

    Zabbix und NoSQL

    Hello,

    I saw by chance this post: https://www.zabbix.com/forum/showpos...83&postcount=3 and i was sad to see that this only discusses on LinkedIN and not here in the forum, can someone tell me why?

    I think Zabbix-SIA must drop some Database support (for example Oracle), and concentrate to MySql/MariaDB, PostgreSQL and SQLite (only for Proxy) than you can use more special database features.

    Few days ago, I copy the trends table to a different server, read it into a csv-file and create a new trends table, but i used an array instead of different columns for min,avg,max.. and it was up to 15-20% faster! I guess there are some more options to tune the querys etc..

    I also talk to some people on Xing (the german LinkedIN) and he said NoSQL is not very performant/efficeient, because there a not Create + Read + Delete optimized (and zabbix use them a lot), NoSQL is better for Create, Read, Update.

    some other thinkings?
    Debian-User

    Sorry for my bad english
  • kloczek
    Senior Member
    • Jun 2006
    • 1771

    #2
    Originally posted by Colttt
    I think Zabbix-SIA must drop some Database support (for example Oracle), and concentrate to MySql/MariaDB, PostgreSQL and SQLite (only for Proxy) than you can use more special database features.
    Could you please list those "special database features" and reasons why anyone should start thinking about those "features"?

    PS. Zabbix SQL requirements are well defined and (IMO) everything is quite well architected. So far with full success zabbix is using only SQL/92 compliant syntax of the queries.
    Number of places in zabbix code which are SQL engine dependent still should be around only 3 or 4 (which is perfect base to build DB engine support as loadable module in future). In other words: IMO development overhead on maintaining DB egine dependent part is really low. Probably most of the development time is consumed only on testing generated binary code with exact DB engine support.
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment

    • syepes
      Junior Member
      • Jun 2016
      • 8

      #3
      I am also investigating a DB alternative, but why not just test one of the following PostgreSQL derivative:

      Citus gives you all the greatness of Postgres plus the superpowers of distributed tables. By distributing your data and queries, your application gets high performance—at any scale. The Citus database is available as open source and as a managed service with Azure Cosmos DB for PostgreSQL.



      Has anyone tested these out?

      Comment

      • Colttt
        Senior Member
        Zabbix Certified Specialist
        • Mar 2009
        • 878

        #4
        Hello,

        Could you please list those "special database features" and reasons why anyone should start thinking about those "features"?
        sorry special is maybe the wrong word, sorry for my bad english..
        but here a few examples:



        i think to use NoSQL don't solve the problem himself, (I can also use a big PostgresSQL or MySQL cluster to "solve" this issue).
        The problem is that zabbix don't use the SQL capabilities perfect(1) and also stores a lot of un-needed data(2).
        (1): Zabbix use in trends 6 rows and i think min,avg,max can be an array {min,avg,max} its much faster in my small test take also a look at http://grisha.org/blog/2015/09/23/st...l-efficiently/ and hstore(https://www.postgresql.org/docs/curr...ic/hstore.html) or TOAST is maybe also an option
        (2): an example i ping a server every 2minutes, every day it goes down 3times (the resone doesn't matter), that are 720values per day and 5040 values per week!, and I said that are a lot of un-needed data, the solution something like this:
        if last value == previous value then update clock else insert newdata.. if we use this we have only 10values per day, store only if a difference, to get the gaps between this values (eg for graphs) take a look at this: http://blog.endpoint.com/2014/04/fil...ve-sum-in.html
        Code:
        timestamp | value
        00:00:00  | 0
        00:04:00  | 0
        00:06:00  | 1
        00:08:00  | 0
        00:16:00  | 0
        00:18:00  | 1
        00:20:00  | 0
        00:36:00  | 0
        00:38:00  | 1
        00:40:00  | 0
        so if we reduce un-needed data the tables are smaller and the querys are much faster


        @syepes: that are not alternatives, that are cluster-solutions, they split the queries to different servers to get the optimum performance (like Galera in MySQL)

        Edit: in PostgreSQL is a new feature for very large Tables BRIN
        Last edited by Colttt; 10-06-2016, 10:29.
        Debian-User

        Sorry for my bad english

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Originally posted by Colttt
          Few days ago, I copy the trends table to a different server, read it into a csv-file and create a new trends table, but i used an array instead of different columns for min,avg,max.. and it was up to 15-20% faster! I guess there are some more options to tune the querys etc..
          Did you ever been trying investigate handling which tables queries takes most of the time?
          If not I can tell you that time spend on those operations are well correlated to size of the tables.
          As long as trends tables takes usually only few percent of the whole database reducing something here by even 15-20% will be really hardly to notice.
          Biggest problem of the zabbix is traffic generated by for example generating all graphs.

          As long as each time to generate png file all graphs points need to be taken by selecting from history or trends tables here are biggest deposits of zabbix improvements.
          In ideal scenario/solution it is possible to add separated process doing caching of thse data and maintaining buffered data by not using MFU/MRU algorithms but by simple shifting data in buffer.
          Other possibility would be start caching those data on browser side and query only about new data and dropping in the graph buffer oldest points.

          Sorry to say this but really try to investigate a bit deeper problem and try to estimate way better impact if your ideas about improvements.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • kloczek
            Senior Member
            • Jun 2006
            • 1771

            #6
            Just one more comment.
            Biggest problem of the zabbix are no the read queries like selects.
            Way bigger problem is with all queries generating all WRITE IOs (inserts and updates).
            Conclusion: any DB backend using horizontally scaled DB engine (SQL or not) will screw you zabbix on biggest scale.

            Doing review of last two months zabbix DB activity I found that for less than week our zabbix DB backend has been doing regularly +2k selects/s reaching in peak more than 6.5k selects/s. It was due…


            If someone has bigger and bigger scalability problems with zabbix DB backend way bigger improvements is possible to gain by stop using Linux and switching to Solaris.
            As long as Linux will not have so effective block layer devices caching infrastructure like ZFS ARC it will be nothing more than toy useful as OS for tablet embedded device but most difficult workloads will be out of the reach of Linux.
            Even OpenZFS at the moment is not an answer as transparent DB compression and holding in memory cached compressed blocks of data used by DB effectively works like increasing size of physical memory by compression ratio factor.
            http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
            https://kloczek.wordpress.com/
            zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
            My zabbix templates https://github.com/kloczek/zabbix-templates

            Comment

            • jan.garaj
              Senior Member
              Zabbix Certified Specialist
              • Jan 2010
              • 506

              #7
              https://support.zabbix.com/browse/ZBXNEXT-714 - please don't spam there.

              If you have a problem, then try to optimize your current DB first: more RAM, kloczek's favorite Solaris optimization, use disk with better IOPs performance (SSD, provisioned IOPs, ...), ...

              If you still need better performance, then move data to the more suitable storage for your use case and build own solution - it should be probably time series database, don't think only about NoSQL. Some options:

              ScyllaDB (better Cassandra)
              IMHO the best NoSQL at the moment, but don't complain about 100% CPU usage - it's price, which you have to pay for performance

              Elasticsearch
              Yes, you can use it as a storage for numeric data, you can also build almost complete monitoring solution by using ELK stack (beats, ...)

              InfluxDB
              They have nice paper Benchmarking InfluxDB vs Elasticsearch for Time-Series - https://influxdata.com/technical-papers/ but I'm still skeptical about horizontal scalability.

              OpenTSDB (HBase)
              HBase backend is for real big data DB, but it'll be pain to install proper cluster for standard admins.

              Consider MongoDB in case of non concurrent access to DB data - otherwise MySQL will be still better option.

              My recommendation: don't ask Zabbix SIA to support your favorite database - don't introduce DB lock.
              Ask for DB middleware/API, where anybody can write own DB integration.
              Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
              My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

              Comment

              • jan.garaj
                Senior Member
                Zabbix Certified Specialist
                • Jan 2010
                • 506

                #8
                BTW you can also use SQL. You just need to make better DB design - http://blog.wix.engineering/2015/12/...-better-nosql/
                Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
                My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

                Comment

                • syepes
                  Junior Member
                  • Jun 2016
                  • 8

                  #9
                  @Colttt, I think that the Citus Append Distribution feature should help improve the hist* and trend* tables.

                  Comment

                  • Colttt
                    Senior Member
                    Zabbix Certified Specialist
                    • Mar 2009
                    • 878

                    #10
                    Originally posted by jan.garaj
                    If you have a problem, then try to optimize your current DB first: more RAM, kloczek's favorite Solaris optimization, use disk with better IOPs performance (SSD, provisioned IOPs, ...),
                    i've no problem, if I had a problem, the i would use the forum "HELP, but I post this in the forum "Suggestions and Feedback"

                    Originally posted by jan.garaj
                    My recommendation: don't ask Zabbix SIA to support your favorite database - don't introduce DB lock.
                    Ask for DB middleware/API, where anybody can write own DB integration.
                    again, thats only a suggestion!
                    Debian-User

                    Sorry for my bad english

                    Comment

                    • kloczek
                      Senior Member
                      • Jun 2006
                      • 1771

                      #11
                      Originally posted by Colttt
                      (2): an example i ping a server every 2minutes, every day it goes down 3times (the resone doesn't matter), that are 720values per day and 5040 values per week!, and I said that are a lot of un-needed data, the solution something like this:
                      if last value == previous value then update clock else insert newdata.. if we use this we have only 10values per day, store only if a difference, to get the gaps between this values (eg for graphs) take a look at this: http://blog.endpoint.com/2014/04/fil...ve-sum-in.html
                      Code:
                      timestamp | value
                      00:00:00  | 0
                      00:04:00  | 0
                      00:06:00  | 1
                      00:08:00  | 0
                      00:16:00  | 0
                      00:18:00  | 1
                      00:20:00  | 0
                      00:36:00  | 0
                      00:38:00  | 1
                      00:40:00  | 0
                      so if we reduce un-needed data the tables are smaller and the querys are much faster
                      Look on history_uint table definition:
                      Code:
                      CREATE TABLE `history_uint` (
                              `itemid`                 bigint unsigned                           NOT NULL,
                              `clock`                  integer         DEFAULT '0'               NOT NULL,
                              `value`                  bigint unsigned DEFAULT '0'               NOT NULL,
                              `ns`                     integer         DEFAULT '0'               NOT NULL
                      ) ENGINE=InnoDB;
                      Using DEFAULT '0' is saving space in table without any additional operations.
                      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                      https://kloczek.wordpress.com/
                      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                      My zabbix templates https://github.com/kloczek/zabbix-templates

                      Comment

                      Working...