Ad Widget

Collapse

How is this even possible?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cheezus
    Member
    • Nov 2011
    • 35

    #1

    How is this even possible?

    Ok... I came here to ask about zabbbix 2.0 perf and found a forum of zabbix with 1k devices monitered and thought WTF?

    We only have 350 centos servers, several routers/firewalls/switches to monitor and zabbix is choking. It's constantly saying unreachable pings to servers. We're on 1.8.x and have tried building a zabbix server from scratch, the suse appliance, tuning the mysql perf. Iowait is < 1% on ssd drives. Zabbix has 8G of memory and 4+ sandy bridge cores. The templates on switches, etc. are scaled back to just cpu, mem and basics with servers only monitoring cpu, network, etc... Polling is 5 minutes and history is 3 months.

    I don't get it really I don't. How is anyone monitoring more than 500+ devices/servers with zabbix without paid support. Is anyone here, not using the secret sauce of paid support, monitoring more than 500+ devices?
  • mcmyst
    Member
    • Feb 2012
    • 72

    #2
    Dedicated Database ? Dedicated zabbix server ?
    Active or not active agent ?
    Proxy or not proxy ?
    Mysql ? Oracle ? PostgreSQL ? SQLite (hope not) ?
    Housekeeper enabled ?
    How many nvps ? triggers ?

    Comment

    • cheezus
      Member
      • Nov 2011
      • 35

      #3
      Thank you for your reply. To answer your questions:

      - Db on same server
      - Zabbix on same server
      - Zabbix server pulls the data back
      - No proxies
      - mysql

      We're on a raid array of 6 x 120G intel ssd drives in r1+0. That's an r620 dell server with a perc h700 raid controller. Sandy bridge xeon cores.

      There's never cpu maxed nor memory maxed and iowait at it's highest is only about 2%, ever.

      So it really does not appear as if the zabbix server itself is starved for hardware. That's what we haven't split mysql out, nor added proxies.

      Are am I missing something here? If so, do let me know because throwing more hardware at the problem doesn't seem like the solution.

      Comment

      • mcmyst
        Member
        • Feb 2012
        • 72

        #4
        The first thing I think you should do is (if possible) switch zabbix agent in active mode, so that your server don't have to do the pulling.

        The second thing you should do is place a template on your zabbix server to monitor internal processes (see the attachment).

        About your mysql server, what engine are you using ? innodb ?
        There is a good topic that explain how to optimize your database server for zabbix:

        And a good website:


        The next thing is : how many nvps and triggers ?
        Attached Files

        Comment

        • tchjts1
          Senior Member
          • May 2008
          • 1605

          #5
          Originally posted by cheezus
          We're on 1.8.x
          Exactly what version of 1.8 are you on?

          Comment

          • Colttt
            Senior Member
            Zabbix Certified Specialist
            • Mar 2009
            • 878

            #6
            how many Server-pollers(Ping, IPMI, ...) do you start?
            Debian-User

            Sorry for my bad english

            Comment

            • cheezus
              Member
              • Nov 2011
              • 35

              #7
              Sorry for the delay. For some reason this forum stopped sending me email notifications of new posts. I was certain I was subscribed but now it says I'm not...

              To answer the questions being asked:

              What version are we on? 1.8.11

              Right now we're using the "not for production" vm of suse on xenserver. However, we have in the past, several times, attempted to build one from scratch on centos 5.6 with all kinds of issues and the same performance problems.

              We actually have a script that restarts zabbix each night now as it appears to leak memory badly. That has helped some but it's still sketchy.

              I'm attaching our zabbix_server.conf and my.cnf file as a zip file to this post.

              I'll follow up with the graphs...
              Attached Files

              Comment

              • cheezus
                Member
                • Nov 2011
                • 35

                #8
                zabbix graphs
                Attached Files

                Comment

                • cheezus
                  Member
                  • Nov 2011
                  • 35

                  #9
                  bump *

                  * bump
                  Last edited by richlv; 06-08-2012, 09:10. Reason: remove offtopic image

                  Comment

                  • c.mammoli
                    Member
                    Zabbix Certified Specialist
                    • Feb 2012
                    • 48

                    #10
                    Your didn't tweak anything about innoDB in your my.cnf... The defaults are very low imho.

                    This is my config (old IBM server, 8GB RAM 4x74GB 10k RPM, 70 nvps)


                    [mysqld]
                    # paths
                    datadir = /var/lib/mysql
                    tmpdir = /var/lib/mysql/tmp

                    # network
                    port = 3306
                    socket = /var/lib/mysql/mysql.sock
                    connect_timeout = 60

                    wait_timeout = 28800
                    # max_connections = 2048
                    max_allowed_packet = 64M
                    max_connect_errors = 1000

                    # limits
                    tmp_table_size = 512M
                    max_heap_table_size = 512M
                    table_cache = 1024

                    # logs
                    log_error = /var/lib/mysql/mysql-error.log
                    slow_query_log_file = /var/lib/mysql/mysql-slow.log
                    slow_query_log = 1
                    long_query_time = 10

                    #sort_buffer_size = 2M

                    #join_buffer_size =256k
                    thread_cache_size = 4

                    # innodb
                    innodb_data_home_dir = /var/lib/mysql
                    innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend

                    innodb_log_group_home_dir = /var/lib/mysql
                    innodb_buffer_pool_size = 5G
                    innodb_additional_mem_pool_size = 128M
                    innodb_log_file_size = 512M
                    innodb_log_buffer_size = 128M
                    innodb_log_files_in_group = 3
                    innodb_flush_log_at_trx_commit = 0
                    innodb_file_per_table = 1
                    innodb_flush_method = O_DIRECT
                    #innodb_io_capacity = 2000
                    innodb_status_file = 1

                    # experimental
                    #innodb_stats_update_need_lock = 0

                    # other stuff
                    event_scheduler = 1
                    query_cache_type = 1
                    query_cache_size = 128M


                    By the way, how many nvps do you have?

                    Comment

                    • cheezus
                      Member
                      • Nov 2011
                      • 35

                      #11
                      Originally posted by c.mammoli
                      Your didn't tweak anything about innoDB in your my.cnf... The defaults are very low imho.

                      This is my config (old IBM server, 8GB RAM 4x74GB 10k RPM, 70 nvps)

                      <<snip>>

                      By the way, how many nvps do you have?
                      Thank you for reviewing the my.cnf We'll try your tweaks. We have 60 nvps.

                      We also just built a 2.0 zabbix server and are going to start testing that along with switching the agents from polling to pushing the data back.

                      Comment

                      • Jason
                        Senior Member
                        • Nov 2007
                        • 430

                        #12
                        That isn't a very high nvps at all...

                        Have you tried looking at iostat -x 5 3 to see what your IO utilisation is?

                        Should be able to handle a lot more vps on that spec... Have you disabled housekeeping and scheduled a DB maintenance script instead? (That is always good at reducing load on the server)

                        Comment

                        • cheezus
                          Member
                          • Nov 2011
                          • 35

                          #13
                          Originally posted by Jason
                          That isn't a very high nvps at all...

                          Have you tried looking at iostat -x 5 3 to see what your IO utilisation is?

                          Should be able to handle a lot more vps on that spec... Have you disabled housekeeping and scheduled a DB maintenance script instead? (That is always good at reducing load on the server)
                          io utilization was only ever peaking at 5%.

                          Comment

                          • cheezus
                            Member
                            • Nov 2011
                            • 35

                            #14
                            After a week of 2.0 with push enabled things are going really well. The zabbix server never falls behind and we're getting all our alerts just fine now.

                            Fingers crossed!

                            Comment

                            Working...