Ad Widget

**kloczek** · 30-04-2017, 18:05

I don't know what is the current state of the proxy sync but seems increasing ZBX_MAX_HRECORDS so far is only solution (I'm using 50k). I remember that zabbix guys planned add configuration parameter around ZBX_MAX_HRECORDS to not force recompile binaries.

Sometimes max sync speed is limited by zbx srv DB backend write speed.
Here is really wide range of factors which may limit max inserts speed. Only few major:
1) If you are using MySQL it is yet another factor related to max_allowed_packet which is by default 4MB
2) MySQL >= 5.6 (https://kloczek.wordpress.com/2016/0...rade-surprise/)
3) increase memory used to holds more data in memory (part of the insert query is reading some data)
4) use partitioned history* and trends* tables

In case enough high write speed I recommend switching DB backend OS from Linux to Solaris on hardware with RAM size used by ZFS ARC memory ~daily volume of the data (just count size of prev day history* tables partitions). Switching from Linux to Solaris on the same HW may even double write speed and decrease as well few times latency of selects. Looks like MySQL caching is worse than ZFS ARC (Adaptive Reclaim Cache).
There are other benefits of using ZFS like simplification of create slave DB instances by ZFS snapshoting and "zfs send/receive" which is not trashing cached in memory data.

**abjornson** · 01-05-2017, 16:37

Thanks so much for your reply @kloczek - very helpful validation....and wow! 50k ZBX_MAX_HRECORDS - I'd planned just to go to 2.5k up from 1k as a start. Do you know a good way to tune if you've gone too high? I'd imagine if you increase too much you'd go beyond the capacity of your hardware?

I will try this soon and report back. I had seen mention that this was under consideration to be implemented as a configuration option. This would be awesome if they'd implement! I love the ease of upgrading from apt repo - hate to break that with a custom compiled version.

Thanks also for the backend db scale tips - these will be useful in the future. I'm pretty confident my current issue is proxy/server issues around ZBX_MAX_HRECORDS....but interested to check those out as well.

**abjornson** · 01-05-2017, 17:01

An additional question: in the scenario i outlined above (proxy cutoff from server for some amount of time)

Does anyone know if there's an easy way to tell the proxy to just dump the backlog and "fast forward" to the present? Obviously in a perfect world, I'd keep all the backlog data.....but failing that, until I get ZBX_MAX_HRECORD sorted....it would be preferable to dump the backlog data if it meant i'd get realtime data going again as soon as the outage was resolved.

**kloczek** · 02-05-2017, 09:59

Originally posted by abjornson

Thanks so much for your reply @kloczek - very helpful validation....and wow! 50k c - I'd planned just to go to 2.5k up from 1k as a start. Do you know a good way to tune if you've gone too high? I'd imagine if you increase too much you'd go beyond the capacity of your hardware?

I will try this soon and report back. I had seen mention that this was under consideration to be implemented as a configuration option. This would be awesome if they'd implement! I love the ease of upgrading from apt repo - hate to break that with a custom compiled version.

Thanks also for the backend db scale tips - these will be useful in the future. I'm pretty confident my current issue is proxy/server issues around ZBX_MAX_HRECORDS....but interested to check those out as well.

My understanding of ZBX_MAX_HRECORDS is that it is more or less way of throttling volume of the data going from the proxy when server processing power has not enough power to process new data against triggers definitions or DB backend is not enough strong. In all my past cases none of those issues have been limiting speed of the syncing data from the proxies after outage (planned or not).
Problem probably must be hitting server when there is enough big flow of the data from active proxies. If it was the original cause probably it would be better to solve this by sending on srv<>prx protocol signal like "I'm busy right now please send me next batch later or half of your batch". When proxy is more then N seconds behind own schedule of the sending data to srv it should have possibility to send let's say 10-50% more data than in prev cycle. With such logic bandwidth of those data could be self regulating.
Max size of the data batch from the proxy is limited only by max size of the write cashes size on server side. Similar algorithm could be used on suggesting passive proxy to send more data in next cycle if server has no congestion in processing data in write cache.

Your question about tuning. If bandwidth of the data from the proxies is an issue there are potentially two bottlenecks on server side. First one is related to processing speed new data against triggers definitions (ratio between number of items to triggers will be related to such bottleneck) and if here will be n o issue only other bottleneck will be related to max performance of the DB backend. In other words there is no straight/simple answer on the question about tuning.
Problem only is that if there are some issues here related to one of those briers probably many people will have problem with diagnosing where is bottleneck.
IMO to help diagnosing those issues probably it would be good to add few new srv internal metrics. I think that one f such metrics could be counter which will be increased after processing the trigger. If this speed of this counter will be reaching plateau it will be clear indicator some bottleneck has been hit.

**kloczek** · 02-05-2017, 10:04

Originally posted by abjornson

An additional question: in the scenario i outlined above (proxy cutoff from server for some amount of time)

Does anyone know if there's an easy way to tell the proxy to just dump the backlog and "fast forward" to the present? Obviously in a perfect world, I'd keep all the backlog data.....but failing that, until I get ZBX_MAX_HRECORD sorted....it would be preferable to dump the backlog data if it meant i'd get realtime data going again as soon as the outage was resolved.

Simplest known to me way of dropping those data is just reinitialize proxy DB backend. Proxy uses the same DB schema as server (because it simplifies DB schema upgrade on start server and proxy). However proxy is using only 4 or 5 tables from whole DB and delete whole DB, create it and import schema takes usually only few seconds.

Ad Widget

Problem recovering after disruption of proxy/server connectivity

Problem recovering after disruption of proxy/server connectivity

Comment

Comment

Comment

Comment

Comment