Ad Widget

Collapse

High IOPS throughput when Zabbix is running

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • m_loichot
    Junior Member
    • Nov 2022
    • 8

    #1

    High IOPS throughput when Zabbix is running

    Hello,

    We have implemented a Zabbix 6.2 on a VM running Debian 11. On this Zabbix there are 487 hosts, 93 templates, 30368 items and 13733 triggers. It has 4 CPUs, 8 GB of RAM and 200 GB of storage.



    We noticed that when Zabbix is started, we notice slowness in our environment among others on our RDS servers. We investigated our disk array which is an MSA 2050 SAS. On this disk array, we have 2 volumes each in RAID 5 of 9 SAS HDDs, the RDS servers and the Zabbix server are on the same volume. When Zabbix is started with all hosts running, we notice at the MSA level a large increase in IOPS throughput see below, first the datastore with Zabbix started and second when it is turned off :

    Click image for larger version

Name:	image.png
Views:	2748
Size:	70.9 KB
ID:	455062
    Click image for larger version

Name:	image.png
Views:	2576
Size:	72.0 KB
ID:	455063
    I did a second test by disabling all hosts on Zabbix and gradually re-enabling them over about 3h, we can see on the volume graph below that as hosts are re-activated the IOPS throughput increases from approximately 50 IOPS to 300 IOPS.

    Click image for larger version

Name:	image.png
Views:	2598
Size:	145.0 KB
ID:	455064

    ​To conclude, we deduce that there is a causal link between Zabbix IOPS throughput and slowness on our environment, but we don't understand how to concretely decrease Zabbix IOPS throughput.



    Thanks in advance for your feedback. I am at your disposal for any questions.
  • Answer selected by m_loichot at 30-11-2022, 11:57.
    Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    How many NVPS you have? (New values per second) Your Zabbix server needs to write all those values to the database.

    RAID 5 is slow in writes (AFAIK) due to writing parity all around the disk set.

    "Has any database tuning been done?​​" above is a valid question.

    Markku

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #2
      You don't say anything about the database, which is a critical component in a functioning Zabbix server environment. What database and version are you using? Is it hosted on the same VM as the server (and presumably, the web front-end)? Has any database tuning been done?​

      Comment

      • m_loichot
        Junior Member
        • Nov 2022
        • 8

        #3
        We use mysql 8.22, it's hosted on the same VM as the server, same thing for the web front-end and mysql was installed at the same time than zabbix server so it's up to date.

        Comment

        • Markku
          Senior Member
          Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
          • Sep 2018
          • 1781

          #4
          How many NVPS you have? (New values per second) Your Zabbix server needs to write all those values to the database.

          RAID 5 is slow in writes (AFAIK) due to writing parity all around the disk set.

          "Has any database tuning been done?​​" above is a valid question.

          Markku

          Comment

          • tim.mooney
            Senior Member
            • Dec 2012
            • 1427

            #5
            Originally posted by m_loichot
            We use mysql 8.22, it's hosted on the same VM as the server
            If you're collecting 30368 items, the Zabbix server has to write those items to the database periodically.

            You understand that modern operating systems and modern relational databases use RAM for various caches, to try reduce the frequency when an I/O operation is needed to read or write some value to storage.

            Your VM has 8 GB of RAM, but the operating system needs some of that, and the web front end needs some of that, and the Zabbix server definitely needs some of that, so you've left very, very little RAM for the database. With almost no RAM available and apparently no database tuning having been done, your database can't effectively do much caching, so it has to read or write from disk much more frequently than it should. Hence, high I/O operations.

            If you want to reduce I/O operations, give your database more RAM (a lot more), and do some database tuning.

            Comment

            • m_loichot
              Junior Member
              • Nov 2022
              • 8

              #6
              Originally posted by Markku
              How many NVPS you have? (New values per second) Your Zabbix server needs to write all those values to the database.

              RAID 5 is slow in writes (AFAIK) due to writing parity all around the disk set.

              "Has any database tuning been done?​​" above is a valid question.

              Markku
              The curve of NVPS goes from 120 to 200 NVPS with a periodicity each 15 minutes

              Click image for larger version

Name:	image.png
Views:	2417
Size:	367.5 KB
ID:	455193

              Comment

              • m_loichot
                Junior Member
                • Nov 2022
                • 8

                #7
                Originally posted by tim.mooney

                If you're collecting 30368 items, the Zabbix server has to write those items to the database periodically.

                You understand that modern operating systems and modern relational databases use RAM for various caches, to try reduce the frequency when an I/O operation is needed to read or write some value to storage.

                Your VM has 8 GB of RAM, but the operating system needs some of that, and the web front end needs some of that, and the Zabbix server definitely needs some of that, so you've left very, very little RAM for the database. With almost no RAM available and apparently no database tuning having been done, your database can't effectively do much caching, so it has to read or write from disk much more frequently than it should. Hence, high I/O operations.

                If you want to reduce I/O operations, give your database more RAM (a lot more), and do some database tuning.
                Thank you for these advices. I will increase the RAM of my VM to 32 GB. I'm novice concerning database so can you tell me what do you mean by "database tuning".

                Comment

                • LenR
                  Senior Member
                  • Sep 2009
                  • 1005

                  #8
                  Some things I have found helpful in mysql tuning, somewhat in order of easy/more return to harder:
                  1. skip-log-bin - if you don't use these logs, don't write them.
                  2. Increase innodb_buffer_pool_size, I use about 50% of the memory on my zabbix-server with local mysql. I gather data mostly on proxies and have another VM for the web frontend.
                  3. Allocate and use large-pages for the innodb buffer pool
                  Most of these changes are /etc/my.cnf settings, some can be changed dynamicall

                  Comment

                  • tim.mooney
                    Senior Member
                    • Dec 2012
                    • 1427

                    #9
                    Originally posted by m_loichot
                    can you tell me what do you mean by "database tuning".
                    That's a very big topic. You'll probably want to do some web searches, some search terms you might try are "what is database tuning", "what is software performance tuning", etc. An IBM doc that was near the top of the results for the second search has a nice summary: https://www.ibm.com/docs/en/zos/2.4....ormance-tuning

                    Most enterprise-level databases have many, many configuration settings that affect how the database performs. Each one is very different, so tuning PostgreSQL is much different from tuning MS SQLServer which is much different from tuning MySQL or MariaDB or PerconaDB.

                    The database can't know if it's the only important application running on a system (in which case, it should probably try use most of the system memory and CPU cycles for itself) or if it's just one of several applications running on a system, as is the case for your Zabbix all-in-one deployment. Since the database can't know how much of the system resources should be devoted to it, you have to make that decision and configure some set of database parameters for it, to give it access to as much of the system resources as is appropriate for your environment.

                    There are lots and lots of web articles and recommendations for tuning MySQL to improve performance. If you spend some time web searching, you're going to find good advice, bad advice, and conflicting advice, and at the start you won't know what's good advice and what's bad advice. Two resources I strongly recommend are the MySQL documentation and the Percona web site and blog. Both sites can be overwhelming when you're starting, but at least the information on those sites is generally accurate.

                    There's also a script you can find called "mysqltuner.pl" that you can run on your system and it will offer tuning suggestions. Whatever it tells you to do, I would use it as a list of settings to research further and then make a decision on whether you want to try any of them in your environment or not. It's not always going to make good recommendations, and it may make assumptions about your environment that aren't true, so always research anything it suggests.

                    No matter what your philosophy is in life, in database tuning (or really any software tuning), it's best to start conservatively: only change one or perhaps a small number of things at the start, run your environment with the changes in place for a while, and try determine if the changes improved things, did nothing, or made things worse. If they improved things but "not enough", then you consider additional changes.

                    In this specific case, the goal is to reduce the I/O load on your storage backend, and we're going to attempt to accomplish that by giving your database access to a bunch more system memory, so that it can do much better caching. Note that there are often separate tuning parameters for read caching vs. write caching. Some database workloads are almost exclusively read operations, a few (but I think less common) database workloads are almost exclusively write operations, and some workloads are a mix. Zabbix in your environment is probably going to be a mixed workload, but the write operations are probably a bigger performance issue for your particular backend storage.

                    With all that said, to help you get started I would echo LenR's suggestion #2: for your​ all-in-one Zabbix install with server+web+database on a single VM, now that your VM has 32 GB of RAM, I would do what LenR suggested: set 'innodb_buffer_pool_size=16G' in your MySQL server config. That should be some where under the '[mysqld]' section of a config file. That gives fully half of the VM's RAM to MySQL, to be used by the InnoDB storage engine's buffer pool. With a VM with 32 GB of RAM it's possible you could even make it slightly larger, but I wouldn't at the start, I would use the 16G setting.

                    If you do some reading on that setting, you may also see mention of a setting 'innodb_buffer_pool_instances', which with older versions of MySQL/MariaDB is used to divide the big buffer pool into chunks (which can help with locking). Although I am using multiple innodb_buffer_pool_instances in my environment (because I'm using a version of MariaDB that still benefits from it), I don't believe you need to set this. I think your version of MySQL has improved the locking with the buffer pool such that you can just run one big 16G pool, rather than something like a 16G pool divided into 8 instances.

                    Start with that, and then review your IOPS over the course of a week or two and see if you need to do more tuning or if that setting alone does what you need.

                    Comment

                    • m_loichot
                      Junior Member
                      • Nov 2022
                      • 8

                      #10
                      Ok, I'm going to do a test for 2 weeks, making the changes you recommended, to gain perspective.

                      Comment

                      • Hubert Kurowski
                        Junior Member
                        • Jan 2024
                        • 16

                        #11
                        Hi m_loichot, how's the tests going?

                        Comment

                        Working...