Ad Widget

Collapse

Zabbix corruption, slow and very large DB

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Curtis Wood
    Junior Member
    • May 2013
    • 3

    #1

    Zabbix corruption, slow and very large DB

    Heyas all,

    I'm hoping someone will be able to point us in the right direction here, or help shed any light on what may be going on.

    We have around 3400+ servers being monitored, @1125 new values per second. We have spent hours optimizing the database. We are using version 2.x of Zabbix.

    We are about to start completely over again for the 4th time, as once the servers all get loaded into the system everything seems to work pretty well. The problems start once the system starts adding/removing/changing hosts on an hourly basis. There are not a lot of changes, all though there can be anywhere from zero to a dozen servers that need changed/removed or added every hour.

    Once this has been going on for a dew days, everything starts slowing down, the housecleaning processes is 100% for 95% of the day and then the corruption eventually starts popping up.

    What I mean by the corruption are hosts/templates referencing items that can't be removed, hosts using templates that don't exist (at least by name) and dozens of random hosts duplicated between 2 and 5 times. The later concerns us as there doesnt seem to be a way to do this through the interface, configuration import or API.

    We are on running this beast on a server with a terrabyte of space for MySQL alone, 24GB of ram and a dual quad 2.4GHZ Xeon.

    The database after running a little over a month is over 480GB in size and continuously growing.

    Are we doing anything wrong here, is it not advisable to make the number of changes we are after everything is setup?
  • trikke76
    Member
    Zabbix Certified Trainer

    • Apr 2013
    • 42

    #2
    tried to disable housekeeper ?

    Also i see that u adding/removing/changing hosts on hour basis how do u do that ? by scripts or by removing them from proxies ?

    housekeeper will not cleanup orphaned data maybe this is where your problem comes from
    Last edited by trikke76; 29-07-2013, 10:33.

    Comment

    • Colttt
      Senior Member
      Zabbix Certified Specialist
      • Mar 2009
      • 878

      #3
      480GB DB-Size?! wtf..

      please post you my.cnf.. do you use SSD? if no, why not?
      and disable Housekeeping and use partition for tables!
      Debian-User

      Sorry for my bad english

      Comment

      • Curtis Wood
        Junior Member
        • May 2013
        • 3

        #4
        It's at 524GB now

        I was under the impression that the partitioning only works with Zabbix 1.8? We are on 2.0, are you still able to use the partitioning in 2.x or would using 1.8 be a better solution for this?

        For the modifying/adding/removing, yes we are using scripts, executed via cron. We aren't using any proxies.

        my.cnf
        [mysqld]
        table_cache = 512
        tmp_table_size = 256M
        max_connections = 2000
        wait_timeout = 2800
        query_cache_size = 512M
        query_cache_limit = 1M

        innodb_log_file_size = 256M
        innodb_log_buffer_size = 8M
        innodb_file_per_table
        innodb_thread_concurrency = 16
        innodb_buffer_pool_size = 15GB
        innodb_additional_mem_pool_size=20M
        innodb_lock_wait_timeout = 240
        innodb_max_dirty_pages_pct = 90
        innodb_flush_method=O_DIRECT
        thread_concurrency=200
        thread_cache_size=4
        max_heap_table_size=256M

        innodb_flush_log_at_trx_commit=2
        key_buffer_size=32M

        sort_buffer_size=16M
        join_buffer_size=256k
        read_buffer_size=256k
        read_rnd_buffer_size=256k
        table_open_cache=1600

        net_read_timeout=2800
        net_write_timeout=2800
        We aren't using SSD at this time, I'll be checking into this possibility though.

        Thx for the suggestions!

        Comment

        • trikke76
          Member
          Zabbix Certified Trainer

          • Apr 2013
          • 42

          #5
          there is a guide for postgres dont know if anyone has one for mysql

          Join the friendly and open Zabbix community on our forums and social media platforms.


          i would recommend postgresql over mysql anyway

          Comment

          • Colttt
            Senior Member
            Zabbix Certified Specialist
            • Mar 2009
            • 878

            #6
            Originally posted by Curtis Wood
            For the modifying/adding/removing, yes we are using scripts, executed via cron. We aren't using any proxies.
            why do you use scripts? do you try autodiscovery in zabbix?

            Originally posted by Curtis Wood
            It's at 524GB now
            wow.. maybe you you must increase the intervall?

            Originally posted by Curtis Wood
            I was under the impression that the partitioning only works with Zabbix 1.8? We are on 2.0, are you still able to use the partitioning in 2.x or would using 1.8 be a better solution for this?
            in Postgresql partition works very fine.. in mysql i dont know

            for you my.cnf, please take a look at http://zabbixzone.com/zabbix/mysql-p...ps-for-zabbix/
            this is a example config for a 24GB-Ram server

            here is a script it will check you mysql-server config..


            Originally posted by Curtis Wood
            We aren't using SSD at this time, I'll be checking into this possibility though.
            this is the best way!
            can you please post you zabbix_server.conf and show the zabbix-server internals?!
            Debian-User

            Sorry for my bad english

            Comment

            • heaje
              Senior Member
              Zabbix Certified Specialist
              • Sep 2009
              • 325

              #7
              You can do partitioning in MySQL with Zabbix 2.0.x, but only on the history and trend tables. There is an article on zabbixzone.com about how to do it. There are plenty of comments on that article from people who have done partitioning with 2.0.x.

              Comment

              • Alexei
                Founder, CEO
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Sep 2004
                • 5654

                #8
                There is an excellent blog post on scaling with tons of useful information published today, see http://blog.zabbix.com/scalable-zabb...ing-9400-nvps/
                Alexei Vladishev
                Creator of Zabbix, Product manager
                New York | Tokyo | Riga
                My Twitter

                Comment

                • trikke76
                  Member
                  Zabbix Certified Trainer

                  • Apr 2013
                  • 42

                  #9
                  Intresting post about Large Environments

                  for the HA setup i did recently a setup on centos 6.4 and had the howto posted on zabbix.org for those intrested. It should be the same for rhel.


                  Join the friendly and open Zabbix community on our forums and social media platforms.

                  Comment

                  • Curtis Wood
                    Junior Member
                    • May 2013
                    • 3

                    #10
                    Thanks everyone for your suggestions.

                    We have upgraded MySQL to 5.6, also looking at setting up proxies for each datacenter, and partitioning out the database.

                    A few questions..

                    For the proxies, I can't find anything that explicitly says this, all though it sounds as though they basically streamline the process? For example the script that keeps the main backend database and Zabbix in sync would take 15+ minutes to complete when it was working on each host individually. All though when it was changed to use the XML import functionality (putting all the modifications and new hosts into one configuration file and import it) the operation would complete in around 1-2 minutes if that. Do the proxies have the same effect in a sense, in that the server dsoesnt have to handle 1000's of incoming connections, it just handles the 1 connection from the proxy and likewise is stream lined - much the same was as making all the changes in the XML file and importing it?

                    For the partitioning, I've read a few places that removing the foreign key constraints on the various tables will allow everything to work with the partitioning and there isn't any negative side effects?

                    To disable the housekeeper process as well, is the following the correct setting in the config?

                    Code:
                    DisableHousekeeping=1
                    When the above is added, and zabbix restarted we get the following error - is this just a configuration for the kernel that needs tweaked, or something in particular with that setting?

                    Code:
                    17991:20130805:103717.036 cannot allocate shared memory of size 228170138: [28] No space left on device
                    17991:20130805:103717.037 cannot allocate shared memory for configuration cache

                    Comment

                    • heaje
                      Senior Member
                      Zabbix Certified Specialist
                      • Sep 2009
                      • 325

                      #11
                      Assuming that all agents/snmp/whatever are monitored by proxies, then yes, the server will only manage a connection from each proxy.

                      As for the error you get, the problem is in the error message .

                      "cannot allocate shared memory for configuration cache"

                      Your configuration cache is not large enough. You need to increase the size of the "CacheSize" variable. See https://www.zabbix.com/documentation.../zabbix_server for more information.

                      Comment

                      • mushero
                        Senior Member
                        • May 2010
                        • 101

                        #12
                        Quite concerned about corruption as it's okay to be slow, but losing data like this means something is wrong.

                        As mentioned, turn off housekeeping, or at least turn it off for a day/week to see if the problems go away - at your NVPS the housekeeper will be doing a LOT of writing - even trx_commit = 2 can still create a lot of load at high volumes - try 0 for that to see if helps and monitor Innodb status for fsyncs() - I assume you have a battery-based RAID controller.

                        Note clear what MySQL you are on, you mention going to 5.6 - at least go to 5.5 and use Percona version. Ah, I see you went to 5.6, again suggest Percona version if possible.

                        In the end, though you have 480GB DB on 15GB of RAM for Innodb - this will not perform well if you ever go look at that old data.

                        Your server is not a beast - for that, suggest use something like a Dell R420 with 128-256GB of RAM, and faster disks, PERC controller, etc. These days RAM is cheap enough to get it all in RAM if possible (though 512GB is expensive, so start with 64-128GB).

                        You also don't post your I/O subsystem and current performance, which is of course critical to this - SSD would help, but you'll need 1TB for the data, logs, etc. so expensive. Hope you at least have good set of 600GB 15K SAS on great RAID controller in RAID 1/10.

                        If you send me private message, happy to send you a best-practices MySQL config (for 5.5) that includes a lot more options.

                        And happy to send you our tool for a server audit which does full analysis of the server, DB, I/O, etc. Might point out some things.

                        Comment

                        Working...