Ad Widget

Collapse

Zabbix Two Hour Database Load

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rsvancara
    Member
    • Jul 2012
    • 42

    #16
    Nvps

    Well our zabbix database is over 600 GB in size in just three weeks? We have tables with several hundred million records in them? So tell me how table partitioning will NOT help our situation. I can tell you that the database IS the bottleneck and I am not sure where you picked up our NVPS from, but it is pretty high in my opinion. We currently monitor 3733 hosts.
    Last edited by rsvancara; 20-08-2012, 17:35.

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #17
      hmm, i guess i mixed up nvps with some other thread. somehow. *scratches head*.
      indeed, in your case nvps is high enough to consider partitioning.
      Zabbix 3.0 Network Monitoring book

      Comment

      • rsvancara
        Member
        • Jul 2012
        • 42

        #18
        No worries

        I was scratching my head too. I am in the process of rewriting the partitioning scripts for Postgresql. The scripts provided seem to exclude some our bigger history tables like history_str.

        Comment

        • frankymryao
          Member
          • Oct 2011
          • 52

          #19
          Originally posted by rsvancara
          Yes, housekeeper runs ever 1 hour. I tried setting it to every four hours and still observed the two spikes in 100% poller busy.

          Could there be something in terms of an item in a template that runs against the zabbix server every two hours. I looked, but I did not find anything, but maybe i missed something.
          About one year ago, we used housekeeper but we found housekeeper is very low efficient. We trucate table `history` every week.

          I think you can try this.

          Our environment: 60w+ items

          Comment

          • fazie
            Junior Member
            • Sep 2012
            • 5

            #20
            Hi,
            I think we were facing the same problem by last week. Our problem was discovery running each 3600 seconds on network devices, which produce lots of data.
            Unfortunately, we notice that if we have 100 discovery of one type on 100 hosts with period of 3600, they do not spread over the 3600 seconds, but fire up almost at once, and this is the problem.
            We deal with this problem by decrease number of DBSyncers, to achieve in normally work 50-60% of load, and if this problem occurs - busy of dbsyncers goes to 100%, and zabbix server is forced to use history cache and spread this in time. This approach covers the Database which is not hurt by that peaks of data from discovery which deadlocks items table.

            We are currently able to run 7k+ nvps one single MySQL instance without any problems. We think that our DB will be able to handle 10k nvps.

            Best regards,
            Dawid

            Comment

            • fpaternot
              Member
              Zabbix Certified Specialist
              • Feb 2013
              • 52

              #21
              Originally posted by fazie
              Hi,
              We are currently able to run 7k+ nvps one single MySQL instance without any problems. We think that our DB will be able to handle 10k nvps.

              Best regards,
              Dawid

              Dawid,

              could you please provide more details on your setup?

              Im about to setup a enviroment close to yours (4k hosts [~100 items] and a few load balancers [thousands of items]) and would like very much as many inputs as possibile from people who use zabbix on very large enviroments.


              Thank you

              Comment

              • rsvancara
                Member
                • Jul 2012
                • 42

                #22
                Zabbix for Large Environments

                1. Use custom templates and only monitor the items you need to monitor.
                2. Select the appropriate hardware, separate your database server from you zabbix server. NVPS is a great metric, but you really need to translate that into database performance. At around 2,000 NVPS, 412654 items, we are seeing around 4-10K queries per second on our database server. Make sure you select the appropriate database server for your load. SSDs might be a great consideration for some of the busier tables like the items table and the ids table. Also factor in overhead for housekeeper. Ultimately your database will be the biggest bottleneck in Zabbix and most of your efforts for performance tuning will involve the database.
                3. Use proxie servers. I can not advocate them enough, especially for larger environments.
                4. Understand up front what your performance goals are. You mention a large number of hosts, but also consider how frequently you are polling those hosts.

                Comment

                • fpaternot
                  Member
                  Zabbix Certified Specialist
                  • Feb 2013
                  • 52

                  #23
                  Thank you for your reply rsvancara.

                  My intended goal is to monitor most items each 60 seconds. Few should be with greater interval, like disk, load balancers (they take too long to pool, the box usually dies when responding many requests).

                  My intended hardware is:
                  - mysql: 2x six core cpu (24 with intel hyperthreading)
                  64GB ram
                  6x600GB 10k SAS in raid 10

                  - zabbix_server: 2x quad core cpu (16 with intel hyperthreading)
                  32GB ram
                  2x 300GB 10k SAS raid 1

                  - zabbix proxy: 2vCPU (virtual server)
                  1GB ram
                  40GB hdd

                  Will use everything double, except for proxy, where it will probabily be 6 or 8 servers.

                  The templates will be custom, i have been learning about them, to collect the fewer items as i need and with the longer interval as possible for each item. But core items have to be collected as fast as possible (60 seconds).

                  I will make tests with mysql tables partitioned and do tests with cassandra DB. Will post the results of my study here. From what i have read, housekeeper should be disabled in both scenarios.

                  Unfortunally SSD probabily will not be used, at first at least.

                  Comment

                  • rsvancara
                    Member
                    • Jul 2012
                    • 42

                    #24
                    Sounds Good

                    Might be some challenges with Zabbix 2.03 and partitioned tables in MySQL. I understand that table partitioning worked in 1.8, but maybe something has changed?

                    I was not aware that Zabbix had support for Cassandra. At least in 2.0X, zabbix supports Oracle/Postgresql and MySQL. For us, MySQL seemed to work the best. I wonder if something like Percona might work better especially with support for NUMA and memory interleaving. Turn hyperthreading off, it will yield no benefit from my experience.

                    Casandra could open a lot of possibilities for scalability, but Zabbix would need to be seriously re-engineered to work with an "eventually" consistent data model.

                    Comment

                    • fpaternot
                      Member
                      Zabbix Certified Specialist
                      • Feb 2013
                      • 52

                      #25
                      From what i kno, zabbix dosent have NoSQL support, but there are at least two patches i've seen that states they make it work. I dont know about its reliability, though.

                      One good document i saw, makes it run using another layer to make the interface from zabbix front-end, zabbix server and the NoSQL database.
                      I need to test it to make a good comparison.



                      I'll see what we can do to partition the tables under 2.04/2.05. Hope it works.

                      Comment

                      Working...