Ad Widget

Collapse

Thanks for your Large Environment Tuning Advises

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Alain Ganuchaud
    Member
    Zabbix Certified Trainer

    • Mar 2009
    • 49

    #1

    Thanks for your Large Environment Tuning Advises

    Hi,

    I am running a 1.8.3 zabbix network with following load:

    Number of hosts (monitored/not monitored/templates) 1570 1448 / 2 / 120
    Number of items (monitored/disabled/not supported) 37720 19212 / 18508 / 0
    Number of triggers (enabled/disabled)[true/unknown/false] 19459 19116 / 343 [434 / 7342 / 11340]
    Required server performance, new values per second 130.3763

    One VM guest master Server on ESX (4 CPUs, 26GB RAM, OS on internal disks, MYSQL on separated SAN), 12 proxies with internal disks. I am able to tune it till 250 values/s, above noway to keep it alive, queues grow, buffer free space decreases, we lose datas.

    Any advise, idea, experience feedback are welcome; I think I read all forum threads on this subject.

    Thanks for your help,
    Alain

    VMWARE ESX 3.5
    - VM tools are installed

    MASTER Debian 5.0.2 Lenny / kernel 2.6.26-2-amd64
    - Shared memory /dev/shm 13GB
    - All disks runing with NOOP scheduler
    - Filesystems configured with noatime,nodiratime options
    - swappiness=10


    Zabbix 1.8.3 central server / main zabbix_server.conf parameters

    StartPollers=10 --> I am wondering about this one, how can I tune it?
    StartPollersUnreachable=1
    StartTrappers=30 --> I am wondering about this one, how can I tune it?
    HousekeepingFrequency=12
    CacheSize=512M
    CacheUpdateFrequency=60
    Timeout=20
    TrapperTimeout=300
    UnreachablePeriod=45
    UnavailableDelay=60
    UnreachableDelay=15
    TmpDir=/tmp --> Is there a benefit to use RAM? and which size?
    HistoryCacheSize=64M
    HistoryTextCacheSize=256M
    StartDBSyncers=20 --> I am wondering about this one, how can I tune it?
    TrendCacheSize=64M


    MYSQL 5.0.51a-24 central zabbix database / main mysqld parameters

    - Main parameters below, tuned with mysqltuner and tuning-primer.sh
    [mysqld]
    key_buffer = 512M
    back_log = 50
    max_connections = 100
    max_connect_errors = 10
    table_cache = 2048
    max_allowed_packet = 16M
    binlog_cache_size = 1M
    max_heap_table_size = 256M
    sort_buffer_size = 8M
    join_buffer_size = 64M
    thread_cache_size = 32
    thread_concurrency = 8
    query_cache_size = 128M
    query_cache_limit = 64M
    thread_stack = 192K
    tmp_table_size = 256M
    innodb_additional_mem_pool_size = 16M
    innodb_buffer_pool_size = 12G
    innodb_flush_log_at_trx_commit = 0
    innodb_log_buffer_size = 8M
    tmpdir=/mysqlram --> 4GB RAM


    Twelve active Proxies running Debian 5.0.2 Lenny with only internal disks
    - don't think there are the root cause of the problem
    Zabbix 1.8.3 Proxies / main zabbix_proxy.conf parameters
    HistoryCacheSize=64M
    HistoryTextCacheSize=256M
    StartDBSyncers=8
    TrendCacheSize=64M
    StartPollers=10
    StartTrappers=7
    HousekeepingFrequency=1
  • Jamest
    Junior Member
    • Sep 2009
    • 12

    #2
    re: Thanks for your Large Environment Tuning Advises Reply to Thread

    What size is the database and growth rate? are you using mostly normal checks or active checks? All on one server or do you have any proxies/nodes? Do you have to use virtual or is a physical possible?

    are you doing a lot of items that include string data?

    There are so many issues that can affect Zabbix performance and many of them are going to be specific to your situation so there needs to be some more specific information about your configuration and maybe some more specific questions to answer.

    I have 26K items on 1.2k hosts and it performs adequately but I am running on a DM configuration because the hosts are in 3 different data centers. I also am using the percona patches to mysql. All of my servers are physical but I am using virtuals as replication slaves for backup of the databases.

    Comment

    • Alain Ganuchaud
      Member
      Zabbix Certified Trainer

      • Mar 2009
      • 49

      #3
      Hi James,

      database is 12GB, growth is about one GB per 45 days, we use mostly normal checks, we have one central Node and 12 proxies, we use VMware ESX 3.5 for the 13 servers right now.

      Yes, a lot of items use to collect strings datas (Windows EventLog but mostly datas returned by external scripts for database monitoring).

      Please, tell me a little bit more about percona pactches. Also, could provide me with your master config files: zabbix_server.conf and mysql config file?

      Thanks for help
      Alain

      Comment

      • Jamest
        Junior Member
        • Sep 2009
        • 12

        #4
        Percona packages for MySQL

        look here for more information:

        Tested, secure enterprise-grade open source software and services that improve the scalability and performance of your MySQL deployments. No enterprise limits, licenses, or lock-in.


        and the IUS community keeps updated packages meant for RHEL/CENTOS for the MySQL Percona patches. They also have a bunch of other stuff that is useful. Such as PHP and Python packages...



        IUS is supported by Rack Space and they are big on Centos/Red Hat.

        Comment

        • Alain Ganuchaud
          Member
          Zabbix Certified Trainer

          • Mar 2009
          • 49

          #5
          We intend within 9 months to monitor up to 15 000 servers, I will test percona server. Thanks for help.

          Alain

          Comment

          • tchjts1
            Senior Member
            • May 2008
            • 1605

            #6
            You can have a look at this thread: http://www.zabbix.com/forum/showthread.php?t=17762

            And just at a quick glance of your setup parameters, I would consider upgrading your version of MySql to 5.1.xx and exploring the possibility of using file-per-table setting. Are you using InnoDB?

            I would also double-check your config settings. You say you are running 1.8.3, right? Did you upgrade from 1.8.2?

            I assume you utilize your proxies for all hosts that are reporting in, you will probably want to tune your startpollers and trappers so they are not struggling.

            You should also activate some basic resource monitoring of your Zabbix infrastructure servers so you know what is happening on them as far as CPU, Memory, IO, etc.

            I'll get over to my work computer in a bit and share my configs with you.
            Last edited by tchjts1; 30-08-2010, 17:34.

            Comment

            • tchjts1
              Senior Member
              • May 2008
              • 1605

              #7
              Here is one of my proxy configs.

              Code:
              # This is config file for ZABBIX server process
              # To get more information about ZABBIX,
              # go http://www.zabbix.com
              
              ############ GENERAL PARAMETERS #################
              
              # IP address (or hostname) of ZABBIX servers.
              
              Server=xxx.xxx.xxx.xxx
              
              # Server port for sending active checks
              
              ServerPort=10051
              
              # Unique hostname.
              
              Hostname=proxy_xxxxxxxx
              
              # Zabbix 1.8 parameters
              CacheSize=256M
              TrendCacheSize=64M
              CacheUpdateFrequency=300
              LogSlowQueries=5000
              
              # Number of pre-forked instances of pollers
              # Default value is 5
              # This parameter must be between 0 and 255
              StartPollers=35
              
              # Number of pre-forked instances of IPMI pollers
              # Default value is 0
              # This parameter must be between 0 and 255
              StartIPMIPollers=2
              
              # Number of pre-forked instances of pollers for unreachable hosts
              # Default value is 1
              # This parameter must be between 0 and 255
              #StartPollersUnreachable=1
              
              # Number of pre-forked instances of trappers
              # Default value is 5
              # This parameter must be between 0 and 255
              StartTrappers=64
              
              # Number of pre-forked instances of ICMP pingers
              # Default value is 1
              # This parameter must be between 0 and 255
              StartPingers=4
              
              # Number of pre-forked instances of discoverers
              # Default value is 1
              # This parameter must be between 0 and 255
              #StartDiscoverers=1
              
              # Number of pre-forked instances of HTTP pollers
              # Default value is 1
              # This parameter must be between 0 and 255
              #StartHTTPPollers=1
              
              # Listen port for trapper. Default port number is # must be between 1024 and 32767
              #ListenPort=10051
              
              # Source IP address for outgouing connections
              #SourceIP=
              
              # Listen interface for trapper. Trapper will listen all network interfaces
              # if this parameter is missing.
              #ListenIP=127.0.0.1
              
              # How often ZABBIX will perform sending hearbeat message
              # (in seconds)
              # Default value is 60 seconds
              # Set to 0 to disable heartbeat messages
              # This parameter must be between 0 and 3600
              HeartbeatFrequency=10
              
              # How often ZABBIX will perform sync configuration data
              # (in seconds)
              # Default value is 3600 seconds (1h)
              10051. This parameter must be between 1 and 604800 (1 week)
              ConfigFrequency=60
              
              # How often ZABBIX will perform housekeeping procedure
              # (in hours)
              # Default value is 1 hour
              # Housekeeping is removing unnecessary information from
              # tables history, alert, and alarms
              # This parameter must be between 1 and 24
              #HousekeepingFrequency=1
              
              # How often ZABBIX will try to send unsent alerts
              # (in seconds)
              # Default value is 30 seconds
              #SenderFrequency=30
              
              # Local bufer size in hours. Proxy will keep collected data N hours.
              # Default value is 0 hours
              #ProxyLocalBuffer=0
              
              # Offline buffer size in hours. It is used when server is not available.
              # Older data is removed.
              # Default value is 1 hours
              ProxyOfflineBuffer=1
              
              # Specifies debug level
              # 0 - debug is not created
              # 1 - critical information
              # 2 - error information
              # 3 - warnings (default)
              # 4 - for debugging (produces lots of information)
              DebugLevel=3
              
              # Specifies how long we wait for agent response (in sec)
              # Must be between 1 and 30
              Timeout=5
              
              # Specifies how many seconds trapper may spend processing new data
              # Must be between 1 and 300
              TrapperTimeout=10
              
              # After how many seconds of unreachability treat a host as unavailable
              #UnreachablePeriod=45
              
              # How ofter check host for availability during the unreachability period
              #UnavailableDelay=15
              
              # How ofter check host for availability during the unavailability period
              #UnavailableDelay=60
              
              # Name of PID file
              PidFile=/var/tmp/zabbix_proxy.pid
              
              # Name of log file
              # If not set, syslog is used
              LogFile=/var/log/zabbix/zabbix_proxy.log
              
              # Maximum size of log file in MB. Set to 0 to disable automatic log rotation.
              #LogFileSize=1
              
              # Location for custom alert scripts
              AlertScriptsPath=/home/zabbix/bin/
              
              # Location of external scripts
              ExternalScripts=/etc/zabbix/externalscripts
              
              # Location of 'fping. Default is /usr/sbin/fping
              # Make sure that fping binary has root permissions and SUID flag set
              FpingLocation=/usr/sbin/fping
              
              # Location of fping6. Default is /usr/sbin/fping6
              # Make sure that fping binary has root permissions and SUID flag set
              Fping6Location=/usr/sbin/fping6
              
              # Temporary directory. Default is /tmp
              #TmpDir=/tmp
              
              # Frequency of ICMP pings (item keys 'icmpping' and 'icmppingsec'). Defauls is 60 seconds.
              #PingerFrequency=60
              
              # Database host name
              # Default is localhost
              
              #DBHost=localhost
              
              # Database name
              # SQLite3 note: path to database file must be provided. DBUser and DBPassword are ignored.
              DBName=zabbix
              
              # Database user
              
              DBUser=xxxxxx
              
              # Database password
              # Comment this line if no password used
              
              DBPassword=xxxxxxx
              
              # Connect to MySQL using Unix socket?
              DBSocket=var/lib/mysql/mysql.sock
              And here is my Zabbix App server config:
              Code:
              ############ GENERAL PARAMETERS #################
              
              # This defines unique NodeID in distributed setup,
              # Default value 0 (standalone server)
              # This parameter must be between 0 and 999
              #NodeID=0
              
              # Zabbix 1.8 specific configuration parameters
              
              CacheSize=256M
              TrendCacheSize=64M
              CacheUpdateFrequency=300
              LogSlowQueries=5000
              
              # Enable DB cache module
              #StartDBSyncers=1
              
              # Number of pre-forked instances of pollers
              # Default value is 5
              # This parameter must be between 0 and 255
              StartPollers=35
              
              # Number of pre-forked instances of IPMI pollers
              # Default value is 0
              # This parameter must be between 0 and 255
              StartIPMIPollers=2
              
              # Number of pre-forked instances of pollers for unreachable hosts
              # Default value is 1
              # This parameter must be between 0 and 255
              StartPollersUnreachable=2
              
              # Number of pre-forked instances of trappers
              # Default value is 5
              # This parameter must be between 0 and 255
              StartTrappers=128
              
              # Number of pre-forked instances of ICMP pingers
              # Default value is 1
              # This parameter must be between 0 and 255
              StartPingers=2
              
              # Number of pre-forked instances of discoverers
              # Default value is 1
              # This parameter must be between 0 and 255
              #StartDiscoverers=1
              
              # Number of pre-forked instances of HTTP pollers
              # Default value is 1
              # This parameter must be between 0 and 255
              #StartHTTPPollers=1
              
              # Listen port for trapper. Default port number is 10051. This parameter
              # must be between 1024 and 32767
              
              #ListenPort=10051
              
              # Source IP address for outgoing connections
              SourceIP=xxx.xxx.xxx.xxx
              
              # Listen interface for trapper. Trapper will listen on all network interfaces
              # if this parameter is missing.
              
              ListenIP=xxx.xxx.xxx.xxx
              
              # How often ZABBIX will perform housekeeping procedure
              # (in hours)
              # Default value is 1 hour
              # Housekeeping is removing unnecessary information from
              # tables history, alert, and alarms
              # This parameter must be between 1 and 24
              
              #HousekeepingFrequency=1
              
              # How often ZABBIX will try to send unsent alerts
              # (in seconds)
              # Default value is 30 seconds
              SenderFrequency=5
              
              # Uncomment this line to disable housekeeping procedure
              DisableHousekeeping=1
              
              # Specifies debug level
              # 0 - debug is not created
              # 1 - critical information
              # 2 - error information
              # 3 - warnings (default)
              # 4 - for debugging (produces lots of information)
              
              DebugLevel=3
              
              # Specifies how long we wait for agent response (in sec)
              # Must be between 1 and 30
              Timeout=5
              
              # Specifies how many seconds trapper may spend processing new data
              # Must be between 1 and 300
              #TrapperTimeout=5
              
              # After how many seconds of unreachability treat a host as unavailable
              #UnreachablePeriod=45
              
              # How often check host for availability during the unavailability period
              #UnavailableDelay=60
              
              # Name of PID file
              
              PidFile=/tmp/zabbix_server.pid
              
              # Name of /tmp/log file
              # If not set, syslog is used
              
              LogFile=/var/log/zabbix/zabbix_server.log
              
              # Maximum size of log file in MB. Set to 0 to disable automatic log rotation.
              LogFileSize=32
              
              # Location for custom alert scripts
              AlertScriptsPath=/opt/zabbix/alertscripts/
              
              # Location of external scripts
              ExternalScripts=/opt/zabbix/externalscripts/
              
              # Location of fping. Default is /usr/sbin/fping
              # Make sure that fping binary has root permissions and SUID flag set
              #FpingLocation=/usr/sbin/fping
              
              # Location of fping6. Default is /usr/sbin/fping6
              # Make sure that fping binary has root permissions and SUID flag set
              #Fping6Location=/usr/sbin/fping6
              
              # Temporary directory. Default is /tmp
              #TmpDir=/tmp
              
              # Frequency of ICMP pings (item keys 'icmpping' and 'icmppingsec'). Default is 60 seconds.
              PingerFrequency=30
              
              # Database host name
              # Default is localhost
              
              DBHost=xxx.xxx.xxx.xxx
              
              # Database name
              # SQLite3 note: path to database file must be provided. DBUser and DBPassword are ignored.
              DBName=zabbix
              
              # Database user
              
              DBUser=xxxxxxx
              
              # Database password
              # Comment this line if no password used
              
              DBPassword=xxxxxxx
              
              # Connect to MySQL using Unix socket?
              #DBSocket=/tmp/mysql.sock

              Comment

              • Alain Ganuchaud
                Member
                Zabbix Certified Trainer

                • Mar 2009
                • 49

                #8
                Thanks for your help,

                Are you using InnoDB?
                Yes off course

                I would also double-check your config settings. You say you are running 1.8.3, right? Did you upgrade from 1.8.2?
                Yes upgrade from 1.8.2, do you mean there is an issue with that?

                I assume you utilize your proxies for all hosts that are reporting in, you will probably want to tune your startpollers and trappers so they are not struggling.
                Yes, want that but really I'm wondering how to tune those parameters, do you have any tricks for that?

                You should also activate some basic resource monitoring of your Zabbix infrastructure servers so you know what is happening on them as far as CPU, Memory, IO, etc.
                This is in place, but because all servers are running on VMWARE, I have some doubts about IOs performances (SAN)

                I'll get over to my work computer in a bit and share my configs with you.
                Thanks for sharing those configs

                Alain

                Comment

                • tchjts1
                  Senior Member
                  • May 2008
                  • 1605

                  #9
                  Originally posted by Alain Ganuchaud
                  Thanks for your help,

                  Are you using InnoDB?
                  Yes off course

                  I would also double-check your config settings. You say you are running 1.8.3, right? Did you upgrade from 1.8.2?
                  Yes upgrade from 1.8.2, do you mean there is an issue with that?
                  No, not necessarily a problem, but if you did an upgrade from 1.6.x to 1.8.2 to 1.8.3 there are some config file changes between 1.6.x and the 1.8 version. While I think the 1.6.x config files will still work with 1.8.x, there are some added benefits to ensuring you always use the latest config files. It is easy to get complacent when doing upgrades and just upgrading binaries/frontends and ignoring any new config settings.

                  Originally posted by Alain Ganuchaud
                  I assume you utilize your proxies for all hosts that are reporting in, you will probably want to tune your startpollers and trappers so they are not struggling.
                  Yes, want that but really I'm wondering how to tune those parameters, do you have any tricks for that?
                  I have no tricks, but if you look at my supplied configs, you can see my pollers and trappers are set higher than stock. You have to keep in mind that I am running on high-end stand-alone servers where you are on VM. Some config settings wil be limited by your available resources (CPU, Mem), but you could easily experiment with trappers/pollers by bumping them up in small increments to see if that relieves your queuing issue.
                  [/quote]

                  Originally posted by Alain Ganuchaud
                  I'll get over to my work computer in a bit and share my configs with you.
                  Thanks for sharing those configs

                  Alain
                  Welcome. Hope you find the solution.

                  Comment

                  Working...