Ad Widget

**fpalladoro** · 05-09-2015, 15:00

+1

Aurora looks promising and could give a huge performance improvement to Zabbix DB.

Also it would be great if someone shares his experience with Aurora in general

**kloczek** · 07-09-2015, 10:53

Originally posted by fpalladoro

+1

Aurora looks promising and could give a huge performance improvement to Zabbix DB.

Also it would be great if someone shares his experience with Aurora in general

For typical year cost of running mid size zabbix DB (100-200k items) you can buy own hardware. If someone has one such database it may be IMO attractive offer. If it would be more databases IMO Aurora cost is to high. With few such databases Auroras cost could be higher that buying hardware, powering, cooling and other DC costs.

Remember that in case Aurora you don't have access to system where is running DB so forget about doing binary hot backups (bigger database than higher probability it is that it will be only option of doing such backups).

PS. IMO Aurora may be attractive for customers with some number of databases with relatively low traffic. If someone has to monitor something like 100k metrics it is possible that in such env would be more databases and Aurora will be too expensive.

**mushero** · 15-11-2015, 17:06

Originally posted by kloczek

For typical year cost of running mid size zabbix DB (100-200k items) you can buy own hardware. If someone has one such database it may be IMO attractive offer. If it would be more databases IMO Aurora cost is to high. With few such databases Auroras cost could be higher that buying hardware, powering, cooling and other DC costs.

Remember that in case Aurora you don't have access to system where is running DB so forget about doing binary hot backups (bigger database than higher probability it is that it will be only option of doing such backups).

PS. IMO Aurora may be attractive for customers with some number of databases with relatively low traffic. If someone has to monitor something like 100k metrics it is possible that in such env would be more databases and Aurora will be too expensive.

I might suggest you spend more time with AWS RDS and Aurora on this as the latter is explicitly designed to handle much larger and faster workloads than MySQL - millions of Items I'm sure and probably tens of thousands per second.

And it does snapshots, replicas, etc. for backups, like RDS.

I'd certainly consider it for very large-scale Zabbix, as seems ideally designed for this.

Steve

**kloczek** · 16-11-2015, 04:34

Originally posted by mushero

I might suggest you spend more time with AWS RDS and Aurora on this as the latter is explicitly designed to handle much larger and faster workloads than MySQL - millions of Items I'm sure and probably tens of thousands per second.

And it does snapshots, replicas, etc. for backups, like RDS.

I'd certainly consider it for very large-scale Zabbix, as seems ideally designed for this.

I have my mysql on Solaris so I have everything what you mention.
I can only guess but Amazon with Aurora must use similar technology underneath. With lzjb ZFS compression all mysql files are compressed with ~2.4 compress ration.

BTW Solaris as zabbix DB platform.
After migrate to Solaris 11.3 GA I see dramatic decrease read and write IOs.
System which is running for env with 2.4k nvps and 150k items is doing (avg in 1d scale) less than 5 read IO/s and about 400 write IO/s. Everything on kit with 48GB RAM.
Solaris 11.3 additionally with storing buffered/cached data in ARC compressed data in memory effectively works like the same memory multiplied by data compress ratio (on system with 48GB zfs is using 26-28GB ARC which works like 26*2.4~=62GB which is bigger amount of memory than physically my system has).

Really what Aurora provides it is not miracle as long as you know how to do few things .. and everything for fraction of the costs of Aurora.

With shared storage between physical kit and placing mysql DB in kernel zone you can have live migration.

In Aurora costs of running database does not depends on how well you data may compress (I'm just testing on slave using gzip-1 which with a bit higher CPU usage seems allow gain on my data 3.2x compress ratio). And yes .. forget about gaining such compress ratio by using mysql row compression.

**mushero** · 16-11-2015, 04:52

Originally posted by kloczek

I have my mysql on Solaris so I have everything what you mention.
I can only guess but Amazon with Aurora must use similar technology underneath. With lzjb ZFS compression all mysql files are compressed with ~2.4 compress ration.

BTW Solaris as zabbix DB platform.
After migrate to Solaris 11.3 GA I see dramatic decrease read and write IOs.
System which is running for env with 2.4k nvps and 150k items is doing (avg in 1d scale) less than 5 read IO/s and about 400 write IO/s. Everything on kit with 48GB RAM.
Solaris 11.3 additionally with storing buffered/cached data in ARC compressed data in memory effectively works like the same memory multiplied by data compress ratio (on system with 48GB zfs is using 26-28GB ARC which works like 26*2.4~=62GB which is bigger amount of memory than physically my system has).

Really what Aurora provides it is not miracle as long as you know how to do few things .. and everything for fraction of the costs of Aurora.

With shared storage between physical kit and placing mysql DB in kernel zone you can have live migration.

In Aurora costs of running database does not depends on how well you data may compress (I'm just testing on slave using gzip-1 which with a bit higher CPU usage seems allow gain on my data 3.2x compress ratio). And yes .. forget about gaining such compress ratio by using mysql row compression.

First I'd suggest this is somewhat different - first they have much faster IO than you do, up to 20,000+ IOPS or more. Replicated across data centers. With tons of easy automation for all this that can be built fast and easy. Yes may not be cheap, but most of our customers (and us) don't care.

Second, they re-wrote the InnoDB and MySQL IO to be much faster with less IO, like 1/10th in some cases, so it scales much higher on the same IO and related RAM, buffers, copies in RAM, NIC, etc. Their goal is 10X increases in throughput over MySQL, though may not be there yet.

Third, a 2.4K NVPS is good, but big systems are looking at 10X that. And then you have to add in all the graphs, history, look-backs, screens, etc. so need to keep data in RAM - our DB is not that large and is 10X larger than your RAM (yes compression may help, at CPU cost), so IO comes back to haunt us, even on SSD. We are looking at multi-TB data sets next year as we move to dynamic high resolution rules.

For big systems, people are moving to TSDBs, but not in Zabbix yet and maybe for archive/trend only, so we'll see as we get to millions of items and may thousands or tens of thousands of NVPS across hundreds of locations globally - worse when we have dynamic container monitoring where servers come and go by the hundreds each day or hour.

Steve

**kloczek** · 16-11-2015, 05:47

Originally posted by mushero

First I'd suggest this is somewhat different - first they have much faster IO than you do, up to 20,000+ IOPS or more.

Seems you completely lost touch with what is possible to have now on current hardware. Cost of i750 Intel NVMe card with 1.2TB of SSD is about £800. This card can do 460k 4KB read IO/s and 290k write Io/s.
For example in X4/X5 Oracle hardware you can scale this up to 4 such cards to have linear scaling of reads and writes (few tests shows that such scaling is possible so far under only Solaris. Knowing what is inside insde Linux kernel I'm not surprised that Linux cannot do the same).
http://www.intel.com/content/www/us/...50-series.html

Replicated across data centers. With tons of easy automation for all this that can be built fast and easy. Yes may not be cheap, but most of our customers (and us) don't care.

No offence but could you please point on these Aurora automation which you are talking about?

I have almost 20 yeas old career working as IT specialist. I'm working with AWS as well more than two years. My experience is that as long as it is not plain host anything else is not easy. Most of the examples on Amazon doc are outdated. Amazon cli tools are constantly evolving an seems caring about backward compatibility is last thing about which Amazon is thinking (it is faaar to what is known to anyone who been wiring with Sun, Oracle and few other companies). Even forums are broken (two days ago trying to post some new topic I've hit looping links).
Amazon API is inconsistent in many places (do you know that on transferring data to s3 bucket you must pass two keys: one your account and second one owner of the s3 bucket .. despite tat in account roles are s3 access roles)

Second, they re-wrote the InnoDB and MySQL IO to be much faster with less IO, like 1/10th in some cases, so it scales much higher on the same IO and related RAM, buffers, copies in RAM, NIC, etc. Their goal is 10X increases in throughput over MySQL, though may not be there yet.

Third, a 2.4K NVPS is good, but big systems are looking at 10X that. And then you have to add in all the graphs, history, look-backs, screens, etc.

You know .. more than 4 years ago I've been first time working with hardware able to do more than 100k IO/s (if you are thinking that 10x bigger performance it is something which I cannot handle).

so need to keep data in RAM - our DB is not that large and is 10X larger than your RAM (yes compression may help, at CPU cost), so IO comes back to haunt us, even on SSD. We are looking at multi-TB data sets next year as we move to dynamic high resolution rules.

That is the beauty of using ZFS, With very good caching you don't need to use in-memory engines. Zabbix creates typical warehouse DB workload. Just checked that my yesterday history* tables partition took about 18GB (uncompressed raw size). With zabbix env storing 10 more data number of write IOs will be lower than 10 because zfs uses COW semantics and converts random write workload to sequential IO traffic.
With about (let it be) 200GB daily stored data to have perfect caching all those data in memory you will need (to simplify calculations) 80GB RAM for ARC with about 3x zfs compression ratio.
I'm using now on handle my zabbix traffic ~6 years old gen6 HP blade (HP started selling gen6 in Jul 2009).
Sorry to say this but any pizza box 1u 2 socket CPU purchased now with additional i750 Intel card can handle 10x bigger zabbix env which I have under control now. 1.2TB wit 3x compression it is about 3.6TB. Just checked that with 15 days history* and with 20 months monthly trends* partitions my database has about 380GB. Looks like with a little more CPU intensive compression it would be possible to store whole database only on storage provided by NVMe card.
Solaris support on non-Oracle hardware is £600/year for two CPU socket system.
My estimation about cost of the new hardware able to handle 10x what I have now in zabbix with 3 years support would be between £6-8k. To have redundancy you must multiple this cost by two. I trust that cost of powering and cooling such hardware you are able to calculate. With those numbers try to calculate 3 years costs of running two Aurora instances with ~200GB RAM.

If someone is thinking that Amazon is Salvation Army and it donates IT infrastructure you may be right .. you are only thinking that it is true.
Yes .. to organize something like Amazon it requires skills and time. If you have skills time necessary to build something on your own hardware is almost the same. However if you have enough big scale picture is not so clear. Do you know that if you are enough big AWS client they offer for customers discounts? What does it mean? That most of the small clients donates cost pf AWS infrastructure for bigger clients.

PS. If you are thinking that my estimations are wrong please point exactly where.

PS2. My private opinion about AWS is that in world where all IT technologies are still speeding up (they are still highly evolving) something like AWS is really slows down innovation. Don't get me wrong I think that AWS does very good job in gigantic number of cases but at the same time by making whole IT area much more homogenic in last years it slows down progress.

**kloczek** · 16-11-2015, 06:34

Second, they re-wrote the InnoDB and MySQL IO to be much faster with less IO, like 1/10th in some cases, so it scales much higher on the same IO and related RAM, buffers, copies in RAM, NIC, etc. Their goal is 10X increases in throughput over MySQL, though may not be there yet.

I've been observing more than month posts about Aurora vs MySQL performance. On none of them I found something like 10x increase in throughput. Depends on test and version of MySQL sometimes Aurora is faster, but sometimes MySQL. All tests which I saw where in +/-10% band.
I would be really happy to know where did you seen 10x throughput difference?

**kloczek** · 16-11-2015, 07:48

Originally posted by mushero

For big systems, people are moving to TSDBs, but not in Zabbix yet and maybe for archive/trend only, so we'll see as we get to millions of items and may thousands or tens of thousands of NVPS across hundreds of locations globally - worse when we have dynamic container monitoring where servers come and go by the hundreds each day or hour.

NVPS is impacting mainly evaluating stream of data against triggers or on propagate some data to inventory. This evaluation happens before data will be written do DB backend. TSDB (or OpenTSDB) cannot help on such scale because zabbix does excellent job on forming bigger batches of data in big inserts (it was possible to see improvement here especially on migration from 2.2.x to 2.4). Remember that zabbix does many other useful things than only storing monitoring data. Simplified model with OpenTSDB it is something which does not matches with what provides zabbix.
On writing data you must remember that if those data must be accessible over some key queries you need indexes, Those indexes on update generates not only write IOs but before modifications some part pf the indexes must be red. This is why higher read latency or lack of cached those data is highly impacting (as consequence) write speed.
If your really-big-zabbix stack is struggling with enough fast writing data I betting that it happens only because not enough data which needs to be updated on write queries are not well cached.

Few months ago preparing my presentation forLondon Solaris SIG meeting in Oracle office I found very interesting publication proving what I knew institutionally, and what I've exploited on optimizing zabbix stack using mysql database placed on Solaris that caching in this zabbix workload is crucial.

http://www.scalemysql.com/blog/2015/08/04/is-mysqlinnodb-good-for-high-volume-inserts/

MySQL creates on high write workloads relatively lowest number of IOs

I'll use again example of my zabbix DB which is using 200GB SSD so IO latency per se is relative low. With my database all write ops (inserts and updates) are fast as long as physical read IO bandwidth is not higher than (ONLY!!) 300/s. After this threshold speed of all inserts are blasted on the wall of reads IOs from storage.
Again: to gain high zabbix write speed you must harness first you read IOs. Why? Because inserts queries are creating combination of reads and write IOs. As long as enough number of write IOs are impacted by higher latency of read IO which needs to red from even very fast SSD you zabbix DB is already DEAD. Storage array read caching or controller caching is usually to slow!!!
Faster than SSD is only RAM. Empirically I found that mysql read cashing is less effective than caching in ZFS ARC. This why my mysql uses ONLY 8GB innodb pool!!! Some time ago doing the same tests on PostgreSQL I found exactly the same effect.
Ergo: as long as someone is using Linux and MySQL or PostgreSQL it is not possible to have the same much higher speed as on Solaris because Linux page caching is waaay less effective than ZFS ARC (Adaptive Reclaim Cache). Page cache not used at all if someone is using innodb_flush_method=O_DIRECT. BTW: on Solaris if someone will start mysql with innodb_flush_method=O_DIRECT in logs will be written that this option on Solaris does not have effect.

If someone don't believe that ARC is this piece of technology which is missing on Linux here is graph with last 12h zfs::arcstat:{hits,misses}

Look on how small number of read IOs was not served from memory. Only avg ~1.4/s misses from more than 6k/s.
And think again .. on Linux there is no equivalent of ARC.

I can only repeat what I wrote on zabbix forums that to architect correctly zabbix stack you must understand many aspects from OS, databases, storage and few other minor areas .. simultaneously.

Attached Files

SS* · 26-01-2016, 02:07

Well articulated Tomasz, you shut those guys up from your proven knowledge and experience from both sides of the fence. I've re-read this post a few times now and going to ensure others read this as it is just so imperative.

-1 Aurora
+1 Hardware with Solaris

**kloczek** · 28-01-2016, 18:33

Originally posted by SS*

Well articulated Tomasz, you shut those guys up from your proven knowledge and experience from both sides of the fence. I've re-read this post a few times now and going to ensure others read this as it is just so imperative.

-1 Aurora
+1 Hardware with Solaris

+1 turned back from way to graveyard

Ad Widget

Has anyone use Amazon Aurora for Zabbix DB ?

Has anyone use Amazon Aurora for Zabbix DB ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment