Announcement

Collapse
No announcement yet.

N Zabbix servers or 1 Zabbix server + N Zabbix proxys

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    N Zabbix servers or 1 Zabbix server + N Zabbix proxys

    We have been using a single Zabbix server on a bare metal box without any zabbix proxy for two years. As our infrastructure growing, the Zabbix server reaches its capacity. We plan to migrate Zabbix monitoring to multiple nodes.

    Choice 1:
    - N Zabbix servers, each server monitors a set of devices.
    -- e.g. server 1 monitors network switches, while server 2 monitors Linux servers, etc.

    Choice 2:
    - 1 Zabbix server + N Zabbix proxys
    -- reading through presentations on Zabbix conferences, there may be issues using proxys.

    Could you please share your experience or offer your recommendations? Thanks in advance!

    #2
    Hi

    In the past, I have successfully hit over 5k+ hosts with +- 3k NVPS.

    -On a single zabbix server (spec'ed correctly)
    -mysql backend that is tuned, setup for table partitioning to fast SSD's for History_* and Trend_* tables
    -Multiple proxies at various locations

    No issues. Not sure which presentations you are looking at. One server, multiple proxies would be my suggestion.

    Comment


      #3
      Hi syndeysider,

      Thanks very much for your advice. It is really helpful.

      Somewhere on the forum mentioned that mysql partitioning is already implemented on Zabbix 3.4, I was wondering if you happen to hear about that. I did not find it on Zabbix 3.4 doc.

      Thanks very much again!

      Comment


        #4
        On our environment now, a single server without any proxy:
        Number of hosts: ~200
        Number of items: > 40K
        Number of triggers: > 20K
        Required server performance, new values per second: > 1K

        If we are going to use the architecture 1 server + N proxy,

        Could you please share your experience or offer your recommendations on the CPU/memory/disk requirements for the server and proxy? Thanks in advance!

        Comment


          #5
          Originally posted by wyang View Post
          On our environment now, a single server without any proxy:
          Number of hosts: ~200
          Number of items: > 40K
          Number of triggers: > 20K
          Required server performance, new values per second: > 1K

          If we are going to use the architecture 1 server + N proxy,

          Could you please share your experience or offer your recommendations on the CPU/memory/disk requirements for the server and proxy? Thanks in advance!
          I see some kind of chicken and egg issue.
          Seems you are looking for some factors about HW requirements when you know quite precisely number of items and NVPS.
          As well NVPS (1k) and number of items (20k) -> avg every item is sampled every 20s. As it is avg value it means that some quite big number of items have quit low sampling time (like less than 5s). Probably you underestimated number of items.
          Calculating NVPS knowing number of items and distribution of the sampling rates is quite difficult.
          I see some explanation that those values which you mention are presenting are kind of guess and it is quite possible that you may a bit overestimated NVPS. If you are working on some estimations 200 items/host is usually enough to have base OS activity monitoring .. without any applications layer metrics. Depends on types of applications which needs to be monitored you may have additional few hundredths to even few thousands items per host.

          Doing any resources needed estimations related to needs of the zabbix stack you can do this only using NVPS and number of the web clients observing monitoring data over web frontend.
          NVPS will determine how big/strong needs to be zabbix server DB backend. In case of 1k NVPS it should enough 8/16 CPU cores/threads, +48GB RAM and 1TB of SATA ssd storage (to have 2 years trends data + 2 weeks raw history data).
          Number of web clients scales linearly with number of web clients and to have 20 clients you will need 4/8 cores/threads and +16GB RAM.
          Requirements for zabbix server are highly correlated to number of triggers and avg number of triggers definitions evaluations against streams of new data. Sadly zabbix still does not provide internal metrics about speed of evaluating triggers and/or cpu time spent in those evaluations.

          HW requirements for zabbix proxies are on the bottom of the list.

          Nevertheless usually requirements for zabbix server and proxies are really secondary/minor. Most important is central DB backend.
          And yet another conclusion: Zabbix stack with 1k NVPS is relatively small one
          Last edited by kloczek; 23-06-2017, 12:46.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment


            #6
            Thanks very much for the recommendations.

            Comment


              #7
              N Zabbix server architecture in our shop

              We have N Zabbix server architecture. With 1000+ nodes on a VM and SAN it seems to be doing well. Average NVPS is around 300 but we have seen more in cases of bursts. Sometimes we have seen I/O complains because of history syncer but otherwise it works great. We have no users using frontend.. it all goes to ELK.

              Keeping configuration in sync and understanding impact of event becomes a challenge assuming you have sorted on what goes to which servers.

              Hope that helps.

              Comment


                #8
                Thanks very much for your help!

                Comment


                  #9
                  So what part of Zabbix is causing the load? What is the bottleneck, CPU, IO, network?

                  We have zabbix server & mysql, 5 main proxies to distribute the load, a few others for network access but minor load and a separate web frontend server. All are VM's, disk is FC.

                  Our Zabbix server uses a lot of ram, I allocated many Gb to innodb buffers, buffer hits are I/O's avoided.

                  Partitioning and disabling housekeeping for history and trends is essential.

                  8300 hosts, 1,272K items, 5350 NVPS.

                  Comment


                    #10
                    Thanks very much!

                    As being mentioned, we have a single Zabbix server configuration.

                    On Software Defined Networking (SDN) devices, it happened that each of the these SDN devices has more than 10K items to be monitored. At that time, the trigger 'zabbix agent is unreachable' was triggered on many hosts, while zabbix_get on the hosts reporting the issue worked. Zabbix poller processes busy and history syncer process busy were reported. Decreasing items being monitored on each device to be less than 2K on each device resolved the issue.

                    Comment


                      #11
                      Heavy polling, snmp or agent, can benefit greatly by using a proxy. Other options are to adjust the number of pollers, timeout and unreachable options.

                      For zabbix_agent items, switching to "active" lessens the load on the server (or proxy) monitoring those hosts.

                      I have seen the housekeeping processes for history cause other problems like host unreachable, where the incoming data doesn't get to the database in time to avoid it appearing to be an unreachable host. To scale, you really need to age history and trends with the partitioning methods and disable housekeeping for those two.

                      Comment


                        #12
                        Thanks very much for the advice!

                        Comment

                        Working...
                        X