Добрый день, нужен совет
Заббикс-сервер 1.8.3 на двух серверах: один это заббикс, второй это база данных. Машинки достаточно слабые, но пока справлялись. Итак:
Когда добавляю нового вин-агента на мониторинг то сервер падает. При этом:
1) На дашборде пишет что "Заббикс запущен - НЕТ" хотя если зайти в лог файл как tail -f zabbix_server.log то видно что там постоянно появляются новые записи, то есть процесс не падал, ничего такого
2) Загрузка процессора на сервере Mysql взлетает до 100%
3) Если попытаться остановить службу заббикс сервера то начинается очень долгий процесс syncing data, до нескольких часов:
Когда же он заканчивается то заббикс можно запустить снова и обычно все работает... до следующее нового узла...
Определенно, что подвисание сервера связано с попыткой агента закачать eventlogs агента на сервер: application и system может каждый содержать по тысяч 50 сообщений накопленных годами, а бравый агент пытается их все залить на сервер. Убедился в этом, что если перед подключением агента логи на узле почистить то все нормально подключается.
Посоветуйте пожалуйста, что можно оптимизировать, чтобы избежать таких жесткий падений. В целом система работает нормально, средняя загрузка процессора на обоих серверах не больше 50%.
Представляю еще данные:
zabbix_server.conf
############ GENERAL PARAMETERS #################
ListenPort=10051
LogFile=/tmp/zabbix_server.log
### Option: PidFile
# Name of PID file.
#
# Mandatory: no
# Default:
# PidFile=/tmp/zabbix_server.pid
### Option: DBHost
# Database host name.
# If set to localhost, socket is used for MySQL.
#
# Mandatory: no
# Default:
DBHost=192.168.1.251
#mysql64
DBName=zabbix
DBUser=zabbix
DBPassword=*******
### Option: DBSocket
# Path to MySQL socket.
#
# Mandatory: no
# Default:
# DBSocket=/tmp/mysql.sock
############ ADVANCED PARAMETERS ################
### Option: StartPollers
# Number of pre-forked instances of pollers.
# You shouldn't run more than 30 pollers normally.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartPollers=5
### Option: StartIPMIPollers
# Number of pre-forked instances of IPMI pollers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartIPMIPollers=0
### Option: StartPollersUnreachable
# Number of pre-forked instances of pollers for unreachable hosts.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartPollersUnreachable=1
### Option: StartTrappers
# Number of pre-forked instances of trappers
#
# Mandatory: no
# Range: 0-255
# Default:
StartTrappers=40
### Option: StartPingers
# Number of pre-forked instances of ICMP pingers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartPingers=1
### Option: StartDiscoverers
# Number of pre-forked instances of discoverers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartDiscoverers=1
### Option: StartHTTPPollers
# Number of pre-forked instances of HTTP pollers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartHTTPPollers=1
### Option: ListenIP
# Listen interface for trapper.
# Trapper will listen on all network interfaces if this parameter is missing.
#
# Mandatory: no
# Default:
# ListenIP=0.0.0.0
# ListenIP=127.0.0.1
### Option: HousekeepingFrequency
# How often Zabbix will perform housekeeping procedure (in hours).
# Housekeeping is removing unnecessary information from history, alert, and alarms tables.
# If PostgreSQL is used, suggested value is 24, as it performs VACUUM.
#
# Mandatory: no
# Range: 1-24
# Default:
# HousekeepingFrequency=1
### Option: DisableHousekeeping
# If set to 1, disables housekeeping.
#
# Mandatory: no
# Range: 0-1
# Default:
# DisableHousekeeping=0
### Option: SenderFrequency
# How often Zabbix will try to send unsent alerts (in seconds).
#
# Mandatory: no
# Range: 5-3600
# Default:
# SenderFrequency=30
### Option: CacheSize
# Size of configuration cache, in bytes.
# Shared memory size for storing hosts and items data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
CacheSize=16M
### Option: CacheUpdateFrequency
# How often Zabbix will perform update of configuration cache, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# CacheUpdateFrequency=60
### Option: HistoryCacheSize
# Size of history cache, in bytes.
# Shared memory size for storing history data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# HistoryCacheSize=16M
### Option: TrendCacheSize
# Size of trend cache, in bytes.
# Shared memory size for storing trends data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# TrendCacheSize=16M
### Option: HistoryTextCacheSize
# Size of text history cache, in bytes.
# Shared memory size for storing character, text or log history data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# HistoryTextCacheSize=16M
### Option: NodeNoEvents
# If set to '1' local events won't be sent to master node.
#
# Mandatory: no
# Range: 0-1
# Default:
# NodeNoEvents=0
### Option: Timeout
# Specifies how long we wait for agent, SNMP device or external check (in seconds).
#
# Mandatory: no
# Range: 1-30
# Default:
Timeout=10
### Option: TrapperTimeout
# Specifies how many seconds trapper may spend processing new data.
#
# Mandatory: no
# Range: 1-300
# Default:
# TrapperTimeout=300
# TrapperTimeout=5
### Option: UnreachablePeriod
# After how many seconds of unreachability treat a host as unavailable
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnreachablePeriod=45
### Option: UnavailableDelay
# How often host is checked for availability during the unavailability period.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnavailableDelay=60
### Option: UnreachableDelay
# How often host is checked for availability during the unreachability period
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnreachableDelay=15
### Option: AlertScriptsPath
# Location of custom alert scripts
#
# Mandatory: no
# Default:
AlertScriptsPath=/home/zabbix/bin/
### Option: ExternalScripts
# Location of external scripts
#
# Mandatory: no
# Default:
#ExternalScripts=/etc/zabbix/externalscripts
### Option: FpingLocation
# Location of fping.
# Make sure that fping binary has root ownership and SUID flag set!
#
# Mandatory: no
# Default:
# FpingLocation=/usr/sbin/fping
### Option: Fping6Location
# Location of fping6.
# Make sure that fping binary has root ownership and SUID flag set
#
# Mandatory: no
# Default:
# Fping6Location=/usr/sbin/fping6
### Option: SSHKeyLocation
# Location of public keys for SSH checks
#
# Mandatory: no
# Default:
# SSHKeyLocation=
### Option: TmpDir
# Temporary directory.
#
# Mandatory: no
# Default:
# TmpDir=/tmp
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
#
# Mandatory: no
# Default:
# Include=
# Include=/etc/zabbix/zabbix_server.general.conf
# Include=/etc/zabbix/zabbix_server/
А вот что выдает mysqltuner :
Заббикс-сервер 1.8.3 на двух серверах: один это заббикс, второй это база данных. Машинки достаточно слабые, но пока справлялись. Итак:
Когда добавляю нового вин-агента на мониторинг то сервер падает. При этом:
1) На дашборде пишет что "Заббикс запущен - НЕТ" хотя если зайти в лог файл как tail -f zabbix_server.log то видно что там постоянно появляются новые записи, то есть процесс не падал, ничего такого
2) Загрузка процессора на сервере Mysql взлетает до 100%
3) Если попытаться остановить службу заббикс сервера то начинается очень долгий процесс syncing data, до нескольких часов:
25163:20101103:130800.595 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25175:20101103:130800.596 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25156:20101103:130800.597 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25170:20101103:130800.598 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25208:20101103:130800.598 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25160:20101103:130800.573 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25180:20101103:130800.600 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25194:20101103:130800.620 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25210:20101103:130800.622 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25152:20101103:130802.578 Syncing history data...
25152:20101103:130818.281 Syncing history data... 0.088505%
25152:20101103:130828.858 Syncing history data... 0.134528%
25152:20101103:130839.007 Syncing history data... 0.177010%
25152:20101103:130850.629 Syncing history data... 0.230113%
25152:20101103:130902.019 Syncing history data... 0.283216%
25152:20101103:130918.058 Syncing history data... 0.318618%
25152:20101103:130928.076 Syncing history data... 0.348710%
25152:20101103:130938.029 Syncing history data... 0.377031%
25152:20101103:130949.119 Syncing history data... 0.412433%
25152:20101103:130959.077 Syncing history data... 0.440755%
25152:20101103:131010.237 Syncing history data... 0.476157%
25152:20101103:131021.356 Syncing history data... 0.511559%
25152:20101103:131031.944 Syncing history data... 0.546961%
25175:20101103:130800.596 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25156:20101103:130800.597 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25170:20101103:130800.598 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25208:20101103:130800.598 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25160:20101103:130800.573 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25180:20101103:130800.600 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25194:20101103:130800.620 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25210:20101103:130800.622 Got signal [signal:15(SIGTERM),sender_pid:25767,sender_uid:0,r eason:0]. Exiting ...
25152:20101103:130802.578 Syncing history data...
25152:20101103:130818.281 Syncing history data... 0.088505%
25152:20101103:130828.858 Syncing history data... 0.134528%
25152:20101103:130839.007 Syncing history data... 0.177010%
25152:20101103:130850.629 Syncing history data... 0.230113%
25152:20101103:130902.019 Syncing history data... 0.283216%
25152:20101103:130918.058 Syncing history data... 0.318618%
25152:20101103:130928.076 Syncing history data... 0.348710%
25152:20101103:130938.029 Syncing history data... 0.377031%
25152:20101103:130949.119 Syncing history data... 0.412433%
25152:20101103:130959.077 Syncing history data... 0.440755%
25152:20101103:131010.237 Syncing history data... 0.476157%
25152:20101103:131021.356 Syncing history data... 0.511559%
25152:20101103:131031.944 Syncing history data... 0.546961%
Определенно, что подвисание сервера связано с попыткой агента закачать eventlogs агента на сервер: application и system может каждый содержать по тысяч 50 сообщений накопленных годами, а бравый агент пытается их все залить на сервер. Убедился в этом, что если перед подключением агента логи на узле почистить то все нормально подключается.
Посоветуйте пожалуйста, что можно оптимизировать, чтобы избежать таких жесткий падений. В целом система работает нормально, средняя загрузка процессора на обоих серверах не больше 50%.
Представляю еще данные:
Количество узлов сети (контролируется/не контролируется/шаблоны) 351 264 / 29 / 58
Количество элементов данных (активных/неактивных/не поддерживается) 23166 11265 / 9703 / 2198
Количество пользователей 12
Требуемое быстродействие сервера, новые значения в секунду 123.19 -
Количество элементов данных (активных/неактивных/не поддерживается) 23166 11265 / 9703 / 2198
Количество пользователей 12
Требуемое быстродействие сервера, новые значения в секунду 123.19 -
zabbix_server.conf
############ GENERAL PARAMETERS #################
ListenPort=10051
LogFile=/tmp/zabbix_server.log
### Option: PidFile
# Name of PID file.
#
# Mandatory: no
# Default:
# PidFile=/tmp/zabbix_server.pid
### Option: DBHost
# Database host name.
# If set to localhost, socket is used for MySQL.
#
# Mandatory: no
# Default:
DBHost=192.168.1.251
#mysql64
DBName=zabbix
DBUser=zabbix
DBPassword=*******
### Option: DBSocket
# Path to MySQL socket.
#
# Mandatory: no
# Default:
# DBSocket=/tmp/mysql.sock
############ ADVANCED PARAMETERS ################
### Option: StartPollers
# Number of pre-forked instances of pollers.
# You shouldn't run more than 30 pollers normally.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartPollers=5
### Option: StartIPMIPollers
# Number of pre-forked instances of IPMI pollers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartIPMIPollers=0
### Option: StartPollersUnreachable
# Number of pre-forked instances of pollers for unreachable hosts.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartPollersUnreachable=1
### Option: StartTrappers
# Number of pre-forked instances of trappers
#
# Mandatory: no
# Range: 0-255
# Default:
StartTrappers=40
### Option: StartPingers
# Number of pre-forked instances of ICMP pingers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartPingers=1
### Option: StartDiscoverers
# Number of pre-forked instances of discoverers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartDiscoverers=1
### Option: StartHTTPPollers
# Number of pre-forked instances of HTTP pollers.
#
# Mandatory: no
# Range: 0-255
# Default:
# StartHTTPPollers=1
### Option: ListenIP
# Listen interface for trapper.
# Trapper will listen on all network interfaces if this parameter is missing.
#
# Mandatory: no
# Default:
# ListenIP=0.0.0.0
# ListenIP=127.0.0.1
### Option: HousekeepingFrequency
# How often Zabbix will perform housekeeping procedure (in hours).
# Housekeeping is removing unnecessary information from history, alert, and alarms tables.
# If PostgreSQL is used, suggested value is 24, as it performs VACUUM.
#
# Mandatory: no
# Range: 1-24
# Default:
# HousekeepingFrequency=1
### Option: DisableHousekeeping
# If set to 1, disables housekeeping.
#
# Mandatory: no
# Range: 0-1
# Default:
# DisableHousekeeping=0
### Option: SenderFrequency
# How often Zabbix will try to send unsent alerts (in seconds).
#
# Mandatory: no
# Range: 5-3600
# Default:
# SenderFrequency=30
### Option: CacheSize
# Size of configuration cache, in bytes.
# Shared memory size for storing hosts and items data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
CacheSize=16M
### Option: CacheUpdateFrequency
# How often Zabbix will perform update of configuration cache, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# CacheUpdateFrequency=60
### Option: HistoryCacheSize
# Size of history cache, in bytes.
# Shared memory size for storing history data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# HistoryCacheSize=16M
### Option: TrendCacheSize
# Size of trend cache, in bytes.
# Shared memory size for storing trends data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# TrendCacheSize=16M
### Option: HistoryTextCacheSize
# Size of text history cache, in bytes.
# Shared memory size for storing character, text or log history data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# HistoryTextCacheSize=16M
### Option: NodeNoEvents
# If set to '1' local events won't be sent to master node.
#
# Mandatory: no
# Range: 0-1
# Default:
# NodeNoEvents=0
### Option: Timeout
# Specifies how long we wait for agent, SNMP device or external check (in seconds).
#
# Mandatory: no
# Range: 1-30
# Default:
Timeout=10
### Option: TrapperTimeout
# Specifies how many seconds trapper may spend processing new data.
#
# Mandatory: no
# Range: 1-300
# Default:
# TrapperTimeout=300
# TrapperTimeout=5
### Option: UnreachablePeriod
# After how many seconds of unreachability treat a host as unavailable
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnreachablePeriod=45
### Option: UnavailableDelay
# How often host is checked for availability during the unavailability period.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnavailableDelay=60
### Option: UnreachableDelay
# How often host is checked for availability during the unreachability period
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnreachableDelay=15
### Option: AlertScriptsPath
# Location of custom alert scripts
#
# Mandatory: no
# Default:
AlertScriptsPath=/home/zabbix/bin/
### Option: ExternalScripts
# Location of external scripts
#
# Mandatory: no
# Default:
#ExternalScripts=/etc/zabbix/externalscripts
### Option: FpingLocation
# Location of fping.
# Make sure that fping binary has root ownership and SUID flag set!
#
# Mandatory: no
# Default:
# FpingLocation=/usr/sbin/fping
### Option: Fping6Location
# Location of fping6.
# Make sure that fping binary has root ownership and SUID flag set
#
# Mandatory: no
# Default:
# Fping6Location=/usr/sbin/fping6
### Option: SSHKeyLocation
# Location of public keys for SSH checks
#
# Mandatory: no
# Default:
# SSHKeyLocation=
### Option: TmpDir
# Temporary directory.
#
# Mandatory: no
# Default:
# TmpDir=/tmp
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
#
# Mandatory: no
# Default:
# Include=
# Include=/etc/zabbix/zabbix_server.general.conf
# Include=/etc/zabbix/zabbix_server/
А вот что выдает mysqltuner :
>> MySQLTuner 1.0.1 - Major Hayden <[email protected]>
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:
-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.51a-3ubuntu5.7-log
[OK] Operating on 64-bit architecture
-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 1M (Tables: 105)
[--] Data in InnoDB tables: 1G (Tables: 128)
[OK] Total fragmented tables: 0
-------- Performance Metrics -------------------------------------------------
[--] Up for: 1d 17h 25m 37s (31M q [210.122 qps], 25K conn, TX: 29B, RX: 6B)
[--] Reads / Writes: 86% / 14%
[--] Total buffers: 2.5G global + 2.6M per thread (250 max threads)
[OK] Maximum possible memory usage: 3.2G (84% of installed RAM)
[OK] Slow queries: 0% (21K/31M)
[OK] Highest usage of available connections: 28% (70/250)
[OK] Key buffer size / total MyISAM indexes: 300.0M/774.0K
[OK] Key buffer hit rate: 99.2% (2M cached / 16K reads)
[!!] Query cache efficiency: 4.9% (1M cached / 25M selects)
[!!] Query cache prunes per day: 6623
[!!] Sorts requiring temporary tables: 35% (198K temp sorts / 558K sorts)
[OK] Temporary tables created on disk: 1% (19K on disk / 1M total)
[OK] Thread cache hit rate: 99% (88 created / 25K connections)
[OK] Table cache hit rate: 61% (382 open / 623 opened)
[OK] Open file limit used: 23% (245/1K)
[OK] Table locks acquired immediately: 99% (49M immediate / 49M locks)
[OK] InnoDB data size / buffer pool: 1.5G/2.0G
-------- Recommendations -----------------------------------------------------
Variables to adjust:
query_cache_limit (> 8M, or use smaller result sets)
query_cache_size (> 32M)
sort_buffer_size (> 1M)
read_rnd_buffer_size (> 256K)
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:
-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.51a-3ubuntu5.7-log
[OK] Operating on 64-bit architecture
-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 1M (Tables: 105)
[--] Data in InnoDB tables: 1G (Tables: 128)
[OK] Total fragmented tables: 0
-------- Performance Metrics -------------------------------------------------
[--] Up for: 1d 17h 25m 37s (31M q [210.122 qps], 25K conn, TX: 29B, RX: 6B)
[--] Reads / Writes: 86% / 14%
[--] Total buffers: 2.5G global + 2.6M per thread (250 max threads)
[OK] Maximum possible memory usage: 3.2G (84% of installed RAM)
[OK] Slow queries: 0% (21K/31M)
[OK] Highest usage of available connections: 28% (70/250)
[OK] Key buffer size / total MyISAM indexes: 300.0M/774.0K
[OK] Key buffer hit rate: 99.2% (2M cached / 16K reads)
[!!] Query cache efficiency: 4.9% (1M cached / 25M selects)
[!!] Query cache prunes per day: 6623
[!!] Sorts requiring temporary tables: 35% (198K temp sorts / 558K sorts)
[OK] Temporary tables created on disk: 1% (19K on disk / 1M total)
[OK] Thread cache hit rate: 99% (88 created / 25K connections)
[OK] Table cache hit rate: 61% (382 open / 623 opened)
[OK] Open file limit used: 23% (245/1K)
[OK] Table locks acquired immediately: 99% (49M immediate / 49M locks)
[OK] InnoDB data size / buffer pool: 1.5G/2.0G
-------- Recommendations -----------------------------------------------------
Variables to adjust:
query_cache_limit (> 8M, or use smaller result sets)
query_cache_size (> 32M)
sort_buffer_size (> 1M)
read_rnd_buffer_size (> 256K)

Comment