Ad Widget


Zabbix Scalability

  • Filter
  • Time
  • Show
Clear All
new posts

    Zabbix Scalability

    We have been using nagios with around 3000 hosts. Due to some scalability issues with nagios we wanted to switch to another monitoring tool and then came across with zabbix. I gone through the scalability and availability option for zabbix and found some solutions with pacemaker and zabbix proxies.But somehow these are not fulfilling my actual requirement and I think this will cause scalability issue. The requirement is, I am gonna use zabbix API for giving live monitoring to customers on our console. As far as I get to know, the data will be served by zabbix server which is a single unit and this can be a bottle-neck because there can be multiple users using the API from my console which lead to a large number of API requests to zabbix server. Is there any way of achieving high performance without affecting zabbix working?

    For context, I have multiple zabbix server. I have about 2700 hosts being monitored by one zabbix server. Also it should be noted I'm running mariadb for the db, the zabbix server process, the zabbix front end with apache, grafana, and other processes all on the same physical server. My biggest issue is not with the API or or processes on the server but with storage limitations and those issues are relatively negligible right now due to tuning I've done. I did do tuning of mariadb, and apache and linux kernel settings and many other areas but it was not extensive and most of it was standard Linux and database tuning needed for any enterprise production application with a database. I do have good SSDs so keep that in mind.

    Iff I was using the newer timescaledb with postgress and a separate server for the db and server processes, and better disk partitioning and some other light tweaks it could probably handle 2 or 3 times the amount of hosts monitored and easier. My cpu utilization runs at about 33% regularly at its highest. Recent 5.x features now allow encrypted database usage for the server process which makes it finally possible to have the db and server on different systems without something like stunnel in the middle to do in-flight encryption.

    Supposedly run many zabbix front ends, but I haven't. The Zabbix front end can run in apache2 or Nginx, it could be others but those are the most popular. The front end is really just a bunch of folders that have to get put in your web server That being said I've had something like ~60 users connected to the web front end simultaneously without issue but the front end UI only just got re-vamped. Most people use a grafana front end which talks to the Zabbix API as zabbix dashboarding is megh, thats its main weakness. I don't think any monitoring tool is gonna have better built in graphs than grafana.

    They also changed the zabbix proxies so that all the preprocessing and a lot of computational load can be done by the proxies and just push to the server for DB write actions basically. zabbix used to have a concept of zabbix nodes which I liked more than proxies for large scale, but once they added the ability for zabbix proxies to due some of the computational activity it helped considerably. I'd still like zabbix nodes to be supported again though if anyones listening There are certain use cases for compliance reasons where its helpful, nay required, to segment monitoring into something akin to a DNS SoA record, what I would call Scope of Monitoring. For now I use different servers to do that and using automation stuff to help manage the individual servers. Supposedly a new feature coming is yaml formatted templates which would make monitoring rule changes across multiple server installs much easier. Importing and exporting of templates for changes to templated rules from server to server is ugly right now but it does work with some effort.

    I also wish it had a better and easier to use RBAC control. I still can't figure out after a few years how to easily, give people access to read the monitoring config and templates, without giving them admin access to change those templates. Its possible to due it but its annoying and dumb rbac.

    Use grafana for your primary dashboarding and graphing, you'll like it better anyway, and use the actual zabbix UI for admin activities if you don't want to use automation like salt/ansible to do admin stuff. at least 90% you can do in the UI you can do in the zabbix UI you can do with its rest api.

    zabbix is extremely powerful and the most versatile monitoring tool I've ever used, but with that power and versatility comes complexity with lots of variables and monitoring template and config implementations. That complexity is needed to satisfy all the use cases it tries to fit and is achieved through various layers of indirection.

    I used to use Nagios. I would never go back to Nagios after using zabbix the last 5 years. Nagios, even with Icinga, felt 12 people sticking, stuffing and restuffing a turkey. If you have more questions, I suggest spending some good time reading through the zabbix for large environments area.
    Last edited by HellLordKB; 30-06-2020, 06:37.



    No announcement yet.