Ad Widget

**Alexei** · 07-04-2017, 11:10

Originally posted by bbrendon

As for scaling I'd say Zabbix suffers from age. Back when Zabbix was born all these new technologies were still many years away. Zabbix hasn't improved since it's inception in terms of architecture and back-end.

If everything goes as planned there will be a number of serious improvements that would bring much better level of scalability (among other things) into Zabbix 4.0. Some of the improvements will be included into 3.4, API for history data is one of them.

As for newer technologies I always ask myself, will it bring any long term value? I'd like all architectural decisions we make today be well justified.

Originally posted by bbrendon

On the plus side, Zabbix has matured continuously over the years and is still going strong.

True. There are also many interesting concepts and ideas we discuss almost every day in our office. I'm afraid most of the activity remains invisible to Zabbix community. Anyway, nowadays more than ever I'm excited about the future of Zabbix, so much to do!

**SBO** · 12-04-2017, 09:15

There are some pretty interesting case studies available on the Zabbix website, but none that can compare to such a big installation.
Will we someday have a case study of a similar environment described by the OP ?

**Alex.S** · 12-04-2017, 10:48

Someday - for sure

For now though, if anyone is interested in sharing their success story, and I don't mean only if you have 200k+ devices etc, then feel free to drop us a line.

**kloczek** · 13-04-2017, 01:37

Originally posted by Alex.S

Someday - for sure

For now though, if anyone is interested in sharing their success story, and I don't mean only if you have 200k+ devices etc, then feel free to drop us a line.

Number of devices is not relevant.
What is important is rate of writing metrics points values per second written to the database backend.
In you case is used single insert of some metrics data to conditions().
Zabbix is not working that way because it writes to exact history* table data based on type of those data. In other words inserting to monitoring data all data from single device never happens in case of abbix.
Zabbix server even with very high NVPS rate like few tenths of thousands is doing this operations using few tenths of inserts per second.
with even 100-200k/metrics points written to DB backend you may be doing few thousands inserts per seconds and you can even even lower number of those inserts by enlarging max_allowed_packet (in case of MySQL).
Bottleneck is somewhere. It is not obvious that insert queries are creating not only write IOs on but some well predictable reads operations rate as well. You must have on DB backend side enough big memory cache to hold in memory all those informations which will be need to find all places which needs to be updated or changed.
You can simplify you writing process up to the moment when your DB backend will be only streaming the data to file without adding any metadata allowing you later find quickly exact subset of historic data.
Even by streaming all new data to single file of few some number files you OS on VFS layer only ow writing new data will be doing read IOs .. so even here writing only huge amount of data may endup in kind of bottleneck caused by read IOs.

Using zabbix with writing new monitoring data as batches in few inserts/s is really god enough in case majority of monitoring cases.

As long as we are not talking about raw monitoring but on top of this alarming layer simplified process of writing new data as sequentially written data may cause big problems on adding alarming layer as long as you triggers definition will be using some historic data to calculate values of your triggers/alarms.
If it is the case again you will be usually hitting read IOs bottleneck than write IOs limit.

It is a bit counterintuitive that to gain sometimes very high write data rate first you must solve reads issues created by MFU/MRU data.

**Alexei** · 13-04-2017, 09:07

Originally posted by kloczek

Bottleneck is somewhere.

Currently Zabbix is limited to processing of about 50-80K NVPS on average with all optimizations made on a decent Intel based server. This level of performance is sufficient for most applications out there, nevertheless it's challenging to achieve better performance in 3.2 or earlier releases.

There are number of places where Zabbix could do much better job and we are fully aware of it.

I think Zabbix 3.4 will eliminate any performance issues on history storage side. I think that the next logical step is to look at the existing architecture and figure out how we can make it more efficient without sacrificing all the guarantees Zabbix provides currently. It's about making Zabbix scale both vertically (still very important!) and horizontally.

We hope to deliver visible results of our work in 4.0, hopefully by the end of this year.

**syndeysider** · 04-05-2017, 00:18

The plan is to not only share the success story but also, hopefully, contribute to the community in terms of open-sourcing a lot of the integration modules we are looking to develop.

This is for a state government department within Australia and there are plenty of hurdles in making sure this implementation is in line with our policies and processes. Simply put, it will be some time before I have anything useful to show.

Another interesting aspect here is that we're using Zabbix to standardize metric collection throughout Infrastructure Services and than pass this onto more scalable technologies like Kafka and Influx. With the likes of Prometheus, Intel Snap, InfluxData etc. I'm more keen now, than ever, to see where Alexei and his team take Zabbix into the future.

In my opinion (and without sounding too dramatic) we are on the precipice of change in how monitoring is done, from A through to Z, in a hybrid, intelligent resource orchestrated, containerized and automated infrastructure environment.

**scwade** · 28-05-2017, 18:43

Originally posted by Alexei

I think Zabbix 3.4 will eliminate any performance issues on history storage side.

I'm having to re-install zabbix and I was wanting to use 3.4. It is in the documents and shown on the general release as released. But I do not see it in the repository. When do you anticipate it being available for installation (my case 16.04 ubuntu).

Thanks,
S

**syndeysider** · 15-06-2017, 02:07

Originally posted by scwade

I'm having to re-install zabbix and I was wanting to use 3.4. It is in the documents and shown on the general release as released. But I do not see it in the repository. When do you anticipate it being available for installation (my case 16.04 ubuntu).

Thanks,
S

Release schedule shows September 2017. This here https://support.zabbix.com/browse/ZBXNEXT-3877 basically solves my current integration problem and I'm super exited about this.

**Colttt** · 26-07-2017, 21:41

Originally posted by syndeysider

Quite possible depending on our policy on publishing code externally. I would like to, but I am new here and will find out!

On another note

https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2

Looks very interesting!

Does anyone try timescale?

**1berto** · 03-11-2018, 00:30

The project that i'm starting now is really big too, for sure 120.000+ linux hosts, probably cant reach up 200.000+ (if everything went ok we could replace the current solutions for the 80.000 windows hosts). Also the complexity is astonishing: servers, desktops, machines with specific periferals, heterogeneous communication links (from vsat to huge dedicated links), 5000+ physical locations, the network covers almost 8,500,000 km2.
This week we are going to production to monitor about 7000 hosts with about 40 items, 5 proxies, 1 server, pgbounce and a dedicated database.

Ad Widget

Largest Zabbix Deployment?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment