Ad Widget

Collapse

Server requirements for large enterprise

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Cenzoooo
    Member
    • Jul 2015
    • 37

    #1

    Server requirements for large enterprise

    Dear zabbix community,

    I need help with prediction from experienced users on how much server requirements is needed in terms of CPU/RAM and storage space (to keep data 1y approx) for following infrastracture:
    • Cca Cisco 200 routers
    • Cca Cisco 250 switches
    • 4 Cisco ASA firewalls
    • 2 Palo Alto PA 5050
    • Cisco ISE
    • Bluecoat Proxy and Reporter
    • Cisco call manager with cca 1400 IP phones
    • F5 LB x 2
    • Cisco Nexus 7K
    • Cisco UCS C servers x 2
    • AIX IPAR x 15
    • Brocade FC switch x 2
    • Cisco FC switch x 2
    • ESX host x 6
    • Flex server x 10
    • IBM Power server x 4
    • Lenovo FC switch x 2
    • SAN Volume Controller x 2
    • Storage (IBM DS8800, Storewize V5000, Storewize V7000) x 4
    • Vmware virtual machine x 45
    I have been using zabbix for smaller environment but now we intend to expand our NMS infrastructure so we need to plan resources accordingly.

    Best Regards!
  • LenR
    Senior Member
    • Sep 2009
    • 1005

    #2
    The number of items and polling interval are the driver. If you discover every port on every switch and collect all data available every 30 seconds, it's a lot of data. Some of it is more useless than other. Your database performance is the key, you need to be able to store the data as fast as you gather it and have enough memory to keep the cache's happy.

    We have 7500 hosts,1,160,000 items and 4500 NVPS hosted in ESX with good I/O, but not SSD. We have a mix of servers and network.

    Comment


    • Cenzoooo
      Cenzoooo commented
      Editing a comment
      Hello,

      I understand. My intention was to put min default collect value to 120 seconds, and discover always with regex rules (not to discover everything). So this environment would be much smaller than yours. I don't see it exeeds 1k VPS.

      Can you tell me little bit more about your setup on ESX?

      I was thinking on this hardware setup :

      4*512GB SSD RAID 10
      8 CPU core procesor (INTEL/AMD)
      16GB RAM


      What would you say?
  • ingus.vilnis
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2014
    • 908

    #3
    Hi,

    To add few cents in this I think the planned hardware for this is more than enough.

    Not really clear how many devices you will have and what would be the total count of metrics but I was able to understand that you have about 100 devices.

    Depends on how long you will keep the data but 512GB on SSD RAID 10 will give you top performance and lots of historical data.

    8 Cores and 16GB RAM also is enough, most of that anyways will go to your database.

    And yes, good system architecture and tuning is the key.

    Comment

    • kloczek
      Senior Member
      • Jun 2006
      • 1771

      #4
      16GB RAM for DB backend I think that it will be not enough.
      More likely +2x more.
      Good indicator of have DB backend enough memory is at least 1:20 read to write IOs ratio.
      Simple most of the data which constantly needs to be delivered by select queries needs to be served from data cached in memory.
      Low latency read operations are crucial as well to have low latency write operation (inserts and updates) because each insert and update before caused write IO need to read few things. As those reads will be affected by read IOs going to physical storage (even SSD) usually zabbix backend is to slow in such situations.
      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
      https://kloczek.wordpress.com/
      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
      My zabbix templates https://github.com/kloczek/zabbix-templates

      Comment

      • ingus.vilnis
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Mar 2014
        • 908

        #5
        I see exactly what you mean.

        But I am having an instance with 370 hosts @ 520 real vps on MySQL with 5G innodb_buffer_pool and no signs of weakness even when opening 2x30 screens with month of data. Needless to mention that pure data collection is causing no stress on the system.

        And my average read to write ratio for the last week is 1:35. Not even on SSD disks.

        So with good internal monitoring and reasonable tuning 16G for the described instance (if we have all details) is more than doable.

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #6
          570 nvps it is relatively low flow ..
          1) did you partition at least history* tables?
          2) shift all memory which still is not in use into innodb_buffer_pool,
          3) change in my.cnf transaction-isolation=READ-COMMITED, innodb_doublewrite=false, innodb_flush_method=O_DIRECT

          Another part is related to select queries which output is enough big to create temporary table as physically written table. In my "Service MySQL" template you can find three items related to monitoring tmp tables activity. If you will see to much in Create_tmp_disk_tables just increase max_heap_table_size (all doc about this type of tuning is in item description).
          Usually people are forgetting to check tmp tables and by this selects are slow even if innodb_buffer_pool and other caches are enough big.
          Usually to have good viability tmp tables activity it is good to create separated volume which will be mouted in my.cnf::tmpdir path and check time to time how much is written to this volume over normal OS IOs monitoring.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • ingus.vilnis
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Mar 2014
            • 908

            #7
            500 nvps by no means is much but my understanding is that the original post was referring to even much smaller system therefore the comparison can be relevant.

            As to your comment - I have checked your templates before and they are really good. I appreciate the thoughts and knowledge that are put into them.

            To reply your questions:
            1) Partitioning. You got to be kidding That is a procedure to be done before the first byte of history data has flown in for an installation aiming over few hundred nvps. And once done correctly, never ever causing problems. Too bad that may setups realize the need for partitioning when there are already gigabytes of data and production rolling.

            2) Yes, this is very useful for everyone, but got to be careful. The general advice for 50-75% of RAM to be allocated for buffer pool can be dangerous. Idle RHEL 7 needs about 1G RAM. MySQL when started uses additionally up to 1G above the allocated buffer pool! And this is a thing many tend to forget. (Not you, this is a general thought about this topic). So on the particular system I describe with 7.68G RAM the effective buffer pool size is 5G because a higher value causes guess what - swapping. And yes, before you ask, vm.swappiness=1.

            3) READ-COMMITED - yes , doublewrite = ON by default and it adds some additional safety in case of an unlikely MySQL crash. Have you really noticed much performance improvements because of this set to OFF? And flush method is O_DIRECT by default.

            Regarding tmp tables I got about 13 Created_tmp_tables / sec and 1.2 Created_tmp_disk_tables. So far nothing to be concerned that much.

            By the way, are you coming to the conference this week?

            Comment

            • kloczek
              Senior Member
              • Jun 2006
              • 1771

              #8
              Originally posted by ingus.vilnis
              To reply your questions:
              1) Partitioning. You got to be kidding That is a procedure to be done before the first byte of history data has flown in for an installation aiming over few hundred nvps. And once done correctly, never ever causing problems. Too bad that may setups realize the need for partitioning when there are already gigabytes of data and production rolling.
              Nope .. you can add partitioning at any time. Just old data will be not partitioned.
              Apply partitioning across all data is especially easy when you have slave db instance.
              You can simple add partitions for old data on slave. When all tables data will be partitioned you can just stop zabbix server -> wait until all data from master will be pushed to slave -> repoint db backend to slave -> start zabbix server -> recreate new slave on prev master location.
              Usually such operation take not more than few seconds. It is even more easy to do such change if you have monitoring all hosts over proxies and only zabbix server internal metrics are monitored without server. In such scenario it will be no gaps in monitoring data.
              I'm always starting from add slave db when I'm rearchitecting DB bacend for new client.
              Use slave as well allows organize making consistent DB backups without impacting master,

              2) Yes, this is very useful for everyone, but got to be careful. The general advice for 50-75% of RAM to be allocated for buffer pool can be dangerous. Idle RHEL 7 needs about 1G RAM. MySQL when started uses additionally up to 1G above the allocated buffer pool! And this is a thing many tend to forget. (Not you, this is a general thought about this topic). So on the particular system I describe with 7.68G RAM the effective buffer pool size is 5G because a higher value causes guess what - swapping. And yes, before you ask, vm.swappiness=1.
              Using swap on MySQL server is pointless.At the moment when your system will start using swap all stack will be useless.
              System working only as MySQL sever will have no to much fluctuations on allocated/deallocated memory -> you can use almost all available memory for caches.
              http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
              https://kloczek.wordpress.com/
              zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
              My zabbix templates https://github.com/kloczek/zabbix-templates

              Comment

              • Cenzoooo
                Member
                • Jul 2015
                • 37

                #9
                Hello guys and thank you for great discussion! Our client told us now that they would prefer to have Zabbix in distributed and virtual environment, and now environment should be even larger. We expect around 10k devices, where 70% would be network devices (moslty Cisco routers and switches), rest is VM environment.

                We would split frontend,zabbix server and DB. What would you guys say about that? I know a lot about zabbix, but I don't have great experience when it comes to planning resources for large environment, so I would rather create good environment from start so I dont have problems later...
                I will need to see what requirements are gonna be needed on VMs for running separately all 3 instances. What would you guys say about that? Or would you rather suggest something to be on hardware?

                Comment

                • kloczek
                  Senior Member
                  • Jun 2006
                  • 1771

                  #10
                  You will need more proxies to scale pooling data over SNMP.
                  If you are controlling physical layer of the network you may try not use SNMPv3. Adding encryption dramatically slows sampling the data (most of the network devices have very weak BMC on which is running SNMP agent),
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment


                  • Cenzoooo
                    Cenzoooo commented
                    Editing a comment
                    Thank you for suggestions! We will definitely use SNMPv2 for polling. Also, we decided to have 3 separate VMs for Frontend, Zabbix Server and Database. Also, we will have backup VMs for all 3 components.
                    Our initial plan is to have our internal infrastracture monitored by Zabbix Server, every other larger customer will be monitored over proxy.

                    This is configuration:

                    Zabbix server 12 x vCPU, 16GB RAM, 5 GB storage
                    Zabbix database 24 x vCPU, 96GB RAM, 1.5 TB storage
                    Zabbix frontend 2 x vCPU, 2GB RAM, 5GB storage

                    Main enigma in this story is PROXY. Since we will have environment 5000-10000 hosts, proxy will be implemented for larger customers. So 1 PROXY will handle approx 1000 devices.
                    What should be requirements of PROXY in terms of CPU, RAM and storage?

                    I am attending Zabbix professional training next week and I hope to get also some suggestions there... kloczek I would be happy to hear your opinion about setup and PROXY requirements.
                Working...