Ad Widget

**kloczek** · 25-02-2016, 15:18

Originally posted by keane

My team is starting our proof of concept work with Zabbix as a viable Enterprise solution for a 30,000 host environment to replace a lot of commercial tools that we currently have in place. I am wondering if there is anyone in the forum that has experience with this type of deployment that would be willing to chat with me about your experience and things that I may want/need to consider as we set off.

In case Zabbix number of monitored hosts is almost meaningless.
Only number of monitored metrics counts and how frequently those metrics needs to be monitored which are main factor affecting something which is called NVPS (New Values Per Second). Next factor is number of the triggers (alarms) build on top of those items (metrics).

For example in my case I have:
Number of hosts 2125
Number of items 256884
Number of triggers 115599
Required server performance, new values per second 3555.26

I saw quite often zabbixes with +250k items with few times even higher number of hosts.
Next factor is number of users connected over web interface. In some large sale envs here may be kind of bottleneck as well.

My recommendation about DB platform is latest Solaris (11.3) with at least 128GB RAM and enable compression on DB files (gzip-1 in my case gives higher compression ratio than gzip-6). As DB engine MySQL (on Sol 11.3 MySQL 5.6 binaries are OOTB with USDT DTrace probes). In case Linux it would be necessary to use few times bigger hardware (I'm not kidding).
ZFS ARC is really kicking ass in case caching in memory all resources for read IOs

If you are interested I can give you more detailed list of tunables which should be done in case of using MySQL (in context of using on ZFS).
In case Solaris you would need only one SSD which will be used as cache and log device. In my case size of the database with 15 days of raw monitoring data is about 390GB DB. After compression it is about 130GB so you need to have raid1+0 (best layout for reads).
In your case probably minimum size of zpool with total uncompressed size would be +1.5TB.
In case of using ZFS really important is adding log device to zpool. It speedups all write operations dramatically. In case Linux adding ext3/4 log device may speedup write IOs as well (however I never tested it; I wold be not surprised if ext3/4 logging on external SSD dev will have some problems as such setup is probably very rarely used). Some NVMe +256GB SSD should be OK (you may need to cut from it only about 4-8GB as zfs log dev).

In my case I'm using pair of very old HP gen6 blades with two CPUs and 96GB (one for mater and second for slave DB).
On the same hardware DB backend was not able to deliver high enough performance more than year ago and on Solaris now after switching to hybrid zpool (spindles + L2ARC and log dev on SSD) still I see in some key factors with headroom between 50 to 200%.
In my case MySQL innodb pool is only 12GB to hold in memory only indexes as seems ARC/L2ARC ZFS caching is way more effective than this done by MySQL. Result on such bending is quite funny because as long as most of the MFU/MRU data are cached in ZFS restart of the DB engine is low impact operation and warming all indexes in innodb pool takes only 15-30s.

More CPU cores than better. You wold be able to refine CPU power to faster transparent compression/decompression.

If you would need more web frontend power you can use additionally pair of DB backed to organize DSR LB (Solaris OOTB supports LB with RRDP). Web frontend(s) can work on Linux.

Feel free to use factors from above on your estimations

**keane** · 25-02-2016, 15:34

That is great information for us to work with as we set out on architecture for the environment. I will certainly review this with my team and appreciate the response! Our initial thought is that we will not put customers on the Zabbix UI, rather we will send the metrics to both Elasticsearch for visualization on Kibana and also send them to Postgres for visualization through Grafana. The reason for both is that we have also built a streaming analytics environment for logging using a numbed of projects (Heka, Kafka, Flink, Elasticsearch, etc..) and we also have referential data in Elasticsearch as well. Splitting the Zabbix data stream will allow customers to view full application performance through Kibana, while core IT teams will use Grafana which seems to be a little more user friendly with playing with multiple ways to view the data. We will use the Zabbix UI more for configuration of some threshold alerting, but may also built a Node.js front end for alerting that just interacts with the Zabbix through the API. We have built something similar for setting alerts on Elasticsearch patterns using the Elastic API as well and it has worked well for our users.

This is great discussion. I will post updates to the thread as we continue to build our architecture and we will share anything that we plan to build that interacts with the Zabbix API so that others can use it and contribute if there is interest. Maybe others have already built things that we can leverage too, I need to spend more time on GitHub to check that out.

Cheers!

**kloczek** · 25-02-2016, 16:00

We just finished ELK tests/POC and for this part Solaris will be base platform as well (zfs compression was sometimes up 6 times)

One thing about my number of hosts.
In reality number of hosts is more than two times lower than reported by zabbix because we are using many dummy hosts for aggregations and other things so on your estimations you should be using not something like x15 but more like x30 multiplier if number of metrics per host is/will be similar.

To have lower probability of over/under estimate some factors better could be forget about number of hosts and use only number of items, trigger and nvps in all calculations.

Another very important thing: in you architecture everything should be monitored over proxies. In reality only zabbix monitoring cannot be done over the proxy. Proxies adds additional layer of HA (when server is down proxy still collects all monitoring data and after reconnecting all historic data will be pushed to srv). Definitely you can put proxy on the same host where will be web frontend.
At the moment proxy has hardcoded max bandwidth to 1k of send metrics point to server but it is only one line modification in source code.

**Colttt** · 07-03-2016, 17:00

@kloczek: hi, I'm not the solaris expert, but why is it in that setup recommended (expect from ZFS (which can also be used in Linux with ZoL))?

**kloczek** · 07-03-2016, 18:15

Linux ZFS is basing on OpenZFS code which is basing on OpenSolaris code.
When Oracle took over Sun they stopped publishing new changes. It was almost 6 years ago.
From this point in time Oracle kernel developers made huge set of significant changes ZFS code dramatically improving ZFS speed.
All those improvements are IMO really worth buying Solaris support even on non-Oracle hardware (my zabbix DB backend is working on HP HW) because cost of such support in case 30k host env will be even +20 times lower than costs of more powerful hardware to have enough speed on OpenZFS.
Additionally access to Solaris SRUs (Service Recommended Updates) released every month is priceless. Example: last month I've migrated zabbix DB pool from SSDs to hybrid zpool consisting from spindles and L2ARC+ZIL on SSD and in few next days I had two times problems with frozen in IOs mysqld processes (it was not possible to kill them even).
Solution? "reboot -d" (reboot with generating crash dump) -> Open SR -> send crash dump data to Oracle Support and in matter of hours we had reply that this problem is known and already has been solved in first SRU released after 11.3 GA.
Linux after +25 years of development still does not have crash dumps working so good as on Solaris.

"pkg update --be-name Solaris-11.3-SRU5.6; reboot" on new BE solved the problem.
Oracle has on road map develop boot envs functionality like it is on Solaris with ZFS on top yum and btrfs but no one knows when it will be ready to use

Really .. if you want play seriously with ZFS forget about anything except regular/latest Solaris.
Sole system with OpenZFS as learning platform? That make sens but if you have already some Solarises around why waste time on something which is .. only toy (?) :P

One fact: few brilliant Solaris kernel developers left Oracle not enjoying Oracle internal culture. However you must know that sis years ago when Oracle bought Sun they started investing even bigger money in Solaris R&D. Result: today on some Solaris kernel space Oracle project now are working more people than in best Sun time on whole Solaris kernel.
That may say you something about have a chance to have quick help with some issue

What may be a main reason of use regular Solaris instead *BSD or Linux?
For example max recordsize=1M

Code:

# zfs get -r compress,compressratio,recordsize data/mysql
NAME                 PROPERTY       VALUE   SOURCE
data/mysql           compression    lzjb    received
data/mysql           compressratio  2.64x   -
data/mysql           recordsize     256K    received
data/mysql/bin-logs  compression    gzip-1  received
data/mysql/bin-logs  compressratio  3.64x   -
data/mysql/bin-logs  recordsize     1M      received
data/mysql/tmp       compression    lzjb    inherited from data/mysql
data/mysql/tmp       compressratio  3.18x   -
data/mysql/tmp       recordsize     256K    inherited from data/mysql
data/mysql/zabbix    compression    lzjb    inherited from data/mysql
data/mysql/zabbix    compressratio  2.54x   -
data/mysql/zabbix    recordsize     256K    inherited from data/mysql

On volume used by mysql to store binary logs as you see is used max recordsize=1MB and on other volumes 256KB (max recordsize on OpensZFS still is with 128K) . It increases bandwidth of the data written/red to/from spindles but on decrease recordsize on bin-logs from 1M and 128K compressratio would be lower by almost 1x (I still had no time to test stronger compressions after I've moved to hybrid zpool when now write operations to spindles and compression is done asynchronously).

PS. Cost of Sol support on 2xCPU non-oracle hardware is something like £500/year so it is IIRC lower/tha same as RH support on the same HW

BTW: if anyone is going to buy some Solaris support it is worth to buy it for one year and renew each year. On buying support for +3years you will receive 30% discount/year. Solarises well "nested" will be used more than 3 years

SS* · 09-04-2016, 00:11

All looks good, few Qs I'm curious about:

Keane "30,000 host environment" - why so many machines?

kloczek "gzip-1 in my case gives higher compression ratio than gzip-6" - did I read that correctly?

"One fact: few brilliant Solaris kernel developers left Oracle not enjoying Oracle internal culture. " - any links to this? How are you aware of internal things and what kind of culture is it?

"max recordsize=1M" - why not have an even larger max recordsize (if it was possible)? i.e. 2M. have you found any trade-offs with larger recordsize vs default or smaller apart from effects on compression? It would look like 1M enables greater compressratio so why not all 1M?

Would ZoL even be using block layer caching or MRU & MFU effectively? I have read about different performance issues using ZoL... I also see efforts to integrate ZFS into Linux eg, latest Ubuntu news.

**kloczek** · 09-04-2016, 03:48

Originally posted by SS*

kloczek "gzip-1 in my case gives higher compression ratio than gzip-6" - did I read that correctly?

Yep. As you see I'm using gzip-1 on compressing mysql binary logs.
Everything depends on what kind of data are compressed. Coincidence that gzip-1 gives better compress ration than gzip-6.

"One fact: few brilliant Solaris kernel developers left Oracle not enjoying Oracle internal culture. " - any links to this? How are you aware of internal things and what kind of culture is it?

Many thing in Oracle changed in meantime.
Now in Oracle is hired much more people working on some kernel projects (IIRC in Oracle is more than 150 such projects only in kernel area). At the time when Bryan Cantrill and Brendan Greg left Oracle on Solaris kernel was working less people than now on some single kernel projects.

"max recordsize=1M" - why not have an even larger max recordsize (if it was possible)? i.e. 2M. have you found any trade-offs with larger recordsize vs default or smaller apart from effects on compression? It would look like 1M enables greater compressratio so why not all 1M?

1MB it is max block (in zfs terminology record) which can be used on allocate data on zfs.
Binary logs are very specific files. MySQL only appends new content to the end of the file so rewriting tail of such files does not consumes to much IOs. Situation is diffent when it comes to data files.
When I've been building zabbix DB backed zpool used by zabbix DB was only on single SSD and with recordsize=1MB in such condition I've been hitting bottleneck of max bandwidth to SSD (250MB/s) on writing all data using such big unit. By this I was forced to change recordsize=256KB on database files but still had enough bandwidth to use 1MB on binary logs.

Now I'm using the same SSD but only as ZIL and L2ARC. Zpoool is assembled from 6 spindles working in RAID1+0:

Code:

root@be1:~# zpool status data
  pool: data
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
        logs
          c2t1d0s1  ONLINE       0     0     0
        cache
          c2t1d0s2  ONLINE       0     0     0

errors: No known data errors

Still had no time to test increase recordsize as now max write bandwidth is spread on more disks. Theoretically it should be possible to do now.
In last weeks my main priority was to upgrade MySQL to 5.6 to have better dtrace support (5.6 comes on Solaris with improved USDT probes).
With better dtrace support I would be able start many other experiments trying to improve even more performance of the MySQL with zabbix DB on Solaris.
BTW: I noticed on move from MySQL 5.5 to 5.6 reduction of reads. On below ~7d graph is possible to see reduction of ARC hits after upgrade MySQL to 5.6.
In mean time as well I made reorganization of volumes to move out tmp and bin-logs directories off the mysql datadir. I've done this because every directory in mysql datadir is recognized as database and in mean time I've added to my mysql template LLD monitoring size of indexes and data in each database so with those directories inside datadir this LLD been failing

Would ZoL even be using block layer caching or MRU & MFU effectively? I have read about different performance issues using ZoL... I also see efforts to integrate ZFS into Linux eg, latest Ubuntu news.

Look on attached graph with hits/misses. You can see that AVG ratio between misses to reads is up to 1/1000 (2.5k misses/s was caused by replicate snapshoted database from master to slave). L2ARC usually has in my case misses/hits ratio like 1-/15-20 so total misses/hits to spindles in my case is only 1/~20,000. Effectively spindles are doing max only few read IOs/s.

Compare to what is now in Solaris 11.3 latest SRU what is in OpenZFS which is used on Linux it is more and more behind what ZFS on regular Solaris can do. Really cost of support of Solaris on non-Oracle hardware if you have high storage needs is in 100% worth of spend money.
ZFS on Linux still can do things which are unimaginable without ZFS on vanilla Linux however ZFS on Solaris is moving very fast forward and in mean time in last 2-3 years most of the developers time on OpenZFS was on spent on porting this code to different platforms.
For example maxrecordsize on OpenZFS still is 128KB.
On moving from Solaris 11.2 GA to 11.3 GA I saw incredible reduction of physical reads IOs (almost 40%) as many things in ARC code where been rewritten (AKA zfs reARC https://blogs.oracle.com/roch/entry/rearc).
All those improvements are not present in OpenZFS code.

Attached Files

Ad Widget

New Deployment - 30,000 hosts

New Deployment - 30,000 hosts

Comment

Comment

Comment

Comment

Comment

Comment

Comment