Ad Widget

**kloczek** · 09-06-2016, 16:12

Originally posted by Colttt

I think Zabbix-SIA must drop some Database support (for example Oracle), and concentrate to MySql/MariaDB, PostgreSQL and SQLite (only for Proxy) than you can use more special database features.

Could you please list those "special database features" and reasons why anyone should start thinking about those "features"?

PS. Zabbix SQL requirements are well defined and (IMO) everything is quite well architected. So far with full success zabbix is using only SQL/92 compliant syntax of the queries.
Number of places in zabbix code which are SQL engine dependent still should be around only 3 or 4 (which is perfect base to build DB engine support as loadable module in future). In other words: IMO development overhead on maintaining DB egine dependent part is really low. Probably most of the development time is consumed only on testing generated binary code with exact DB engine support.

**syepes** · 09-06-2016, 23:12

I am also investigating a DB alternative, but why not just test one of the following PostgreSQL derivative:

Citus Data | Distributed Postgres. At any scale.

https://www.citusdata.com/

Citus gives you all the greatness of Postgres plus the superpowers of distributed tables. By distributing your data and queries, your application gets high performance—at any scale. The Citus database is available as open source and as a managed service on Azure.

http://www.postgres-xl.org/

Has anyone tested these out?

**Colttt** · 10-06-2016, 09:37

Hello,

Could you please list those "special database features" and reasons why anyone should start thinking about those "features"?

sorry special is maybe the wrong word, sorry for my bad english..
but here a few examples:

Loading...

https://support.zabbix.com/browse/ZBXNEXT-1006

Loading...

https://support.zabbix.com/browse/ZBXNEXT-1847

i think to use NoSQL don't solve the problem himself, (I can also use a big PostgresSQL or MySQL cluster to "solve" this issue).
The problem is that zabbix don't use the SQL capabilities perfect(1) and also stores a lot of un-needed data(2).
(1): Zabbix use in trends 6 rows and i think min,avg,max can be an array {min,avg,max} its much faster in my small test take also a look at http://grisha.org/blog/2015/09/23/st...l-efficiently/ and hstore(https://www.postgresql.org/docs/curr...ic/hstore.html) or TOAST is maybe also an option
(2): an example i ping a server every 2minutes, every day it goes down 3times (the resone doesn't matter), that are 720values per day and 5040 values per week!, and I said that are a lot of un-needed data, the solution something like this:
if last value == previous value then update clock else insert newdata.. if we use this we have only 10values per day, store only if a difference, to get the gaps between this values (eg for graphs) take a look at this: http://blog.endpoint.com/2014/04/fil...ve-sum-in.html

Code:

timestamp | value
00:00:00  | 0
00:04:00  | 0
00:06:00  | 1
00:08:00  | 0
00:16:00  | 0
00:18:00  | 1
00:20:00  | 0
00:36:00  | 0
00:38:00  | 1
00:40:00  | 0

so if we reduce un-needed data the tables are smaller and the querys are much faster

@syepes: that are not alternatives, that are cluster-solutions, they split the queries to different servers to get the optimum performance (like Galera in MySQL)

Edit: in PostgreSQL is a new feature for very large Tables BRIN

**kloczek** · 10-06-2016, 13:44

Originally posted by Colttt

Few days ago, I copy the trends table to a different server, read it into a csv-file and create a new trends table, but i used an array instead of different columns for min,avg,max.. and it was up to 15-20% faster! I guess there are some more options to tune the querys etc..

Did you ever been trying investigate handling which tables queries takes most of the time?
If not I can tell you that time spend on those operations are well correlated to size of the tables.
As long as trends tables takes usually only few percent of the whole database reducing something here by even 15-20% will be really hardly to notice.
Biggest problem of the zabbix is traffic generated by for example generating all graphs.

As long as each time to generate png file all graphs points need to be taken by selecting from history or trends tables here are biggest deposits of zabbix improvements.
In ideal scenario/solution it is possible to add separated process doing caching of thse data and maintaining buffered data by not using MFU/MRU algorithms but by simple shifting data in buffer.
Other possibility would be start caching those data on browser side and query only about new data and dropping in the graph buffer oldest points.

Sorry to say this but really try to investigate a bit deeper problem and try to estimate way better impact if your ideas about improvements.

**kloczek** · 10-06-2016, 13:57

Just one more comment.
Biggest problem of the zabbix are no the read queries like selects.
Way bigger problem is with all queries generating all WRITE IOs (inserts and updates).
Conclusion: any DB backend using horizontally scaled DB engine (SQL or not) will screw you zabbix on biggest scale.

Punching 2k selects barrier on zabbix MySQL DB backend

https://kloczek.wordpress.com/2016/05/05/punching-2k-selects-barrier-on-zabbix-mysql-db-backend/

Doing review of last two months zabbix DB activity I found that for less than week our zabbix DB backend has been doing regularly +2k selects/s reaching in peak more than 6.5k selects/s. It was due…

If someone has bigger and bigger scalability problems with zabbix DB backend way bigger improvements is possible to gain by stop using Linux and switching to Solaris.
As long as Linux will not have so effective block layer devices caching infrastructure like ZFS ARC it will be nothing more than toy useful as OS for tablet embedded device but most difficult workloads will be out of the reach of Linux.
Even OpenZFS at the moment is not an answer as transparent DB compression and holding in memory cached compressed blocks of data used by DB effectively works like increasing size of physical memory by compression ratio factor.

**jan.garaj** · 10-06-2016, 22:33

https://support.zabbix.com/browse/ZBXNEXT-714 - please don't spam there.

If you have a problem, then try to optimize your current DB first: more RAM, kloczek's favorite Solaris optimization, use disk with better IOPs performance (SSD, provisioned IOPs, ...), ...

If you still need better performance, then move data to the more suitable storage for your use case and build own solution - it should be probably time series database, don't think only about NoSQL. Some options:

ScyllaDB (better Cassandra)
IMHO the best NoSQL at the moment, but don't complain about 100% CPU usage - it's price, which you have to pay for performance

Elasticsearch
Yes, you can use it as a storage for numeric data, you can also build almost complete monitoring solution by using ELK stack (beats, ...)

InfluxDB
They have nice paper Benchmarking InfluxDB vs Elasticsearch for Time-Series - https://influxdata.com/technical-papers/ but I'm still skeptical about horizontal scalability.

OpenTSDB (HBase)
HBase backend is for real big data DB, but it'll be pain to install proper cluster for standard admins.

Consider MongoDB in case of non concurrent access to DB data - otherwise MySQL will be still better option.

My recommendation: don't ask Zabbix SIA to support your favorite database - don't introduce DB lock.
Ask for DB middleware/API, where anybody can write own DB integration.

**jan.garaj** · 10-06-2016, 22:34

BTW you can also use SQL. You just need to make better DB design - http://blog.wix.engineering/2015/12/...-better-nosql/

**syepes** · 10-06-2016, 22:36

@Colttt, I think that the Citus Append Distribution feature should help improve the hist* and trend* tables.

**Colttt** · 01-09-2016, 15:37

Originally posted by jan.garaj

If you have a problem, then try to optimize your current DB first: more RAM, kloczek's favorite Solaris optimization, use disk with better IOPs performance (SSD, provisioned IOPs, ...),

i've no problem, if I had a problem, the i would use the forum "HELP, but I post this in the forum "Suggestions and Feedback"

Originally posted by jan.garaj

My recommendation: don't ask Zabbix SIA to support your favorite database - don't introduce DB lock.
Ask for DB middleware/API, where anybody can write own DB integration.

again, thats only a suggestion!

**kloczek** · 02-09-2016, 03:38

Originally posted by Colttt

(2): an example i ping a server every 2minutes, every day it goes down 3times (the resone doesn't matter), that are 720values per day and 5040 values per week!, and I said that are a lot of un-needed data, the solution something like this:
if last value == previous value then update clock else insert newdata.. if we use this we have only 10values per day, store only if a difference, to get the gaps between this values (eg for graphs) take a look at this: http://blog.endpoint.com/2014/04/fil...ve-sum-in.html

Code:

timestamp | value
00:00:00  | 0
00:04:00  | 0
00:06:00  | 1
00:08:00  | 0
00:16:00  | 0
00:18:00  | 1
00:20:00  | 0
00:36:00  | 0
00:38:00  | 1
00:40:00  | 0

so if we reduce un-needed data the tables are smaller and the querys are much faster

Look on history_uint table definition:

Code:

CREATE TABLE `history_uint` (
        `itemid`                 bigint unsigned                           NOT NULL,
        `clock`                  integer         DEFAULT '0'               NOT NULL,
        `value`                  bigint unsigned DEFAULT '0'               NOT NULL,
        `ns`                     integer         DEFAULT '0'               NOT NULL
) ENGINE=InnoDB;

Using DEFAULT '0' is saving space in table without any additional operations.

Ad Widget

Zabbix und NoSQL

Zabbix und NoSQL

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment