Всем привет.
Мой заббикс сервер приказал долго жить и 5 раз упал на нож.
Забикс сервер упал после добавления очередного Cisco router'a в мониторинг. Последнее, что привлекло мое внимание, это количество items, trigers у последнего подключенного рутера. Оно зашкаливало 1-3к , и потом сообщение о том что служба сервера недоступна. http://prntscr.com/nrspkl
/var/log/messages
May 22 12:41:43 zabbix systemd: Starting Zabbix Server...
May 22 12:41:43 zabbix systemd: zabbix-server.service: Supervising process 2737 which is not our child. We'll most likely not notice when it exits.
May 22 12:41:43 zabbix systemd: Started Zabbix Server.
May 22 12:41:44 zabbix mysqld: 2019-05-22 12:41:44 103311 [Warning] Aborted connection 103311 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
May 22 12:41:44 zabbix systemd: zabbix-server.service: main process exited, code=exited, status=1/FAILURE
May 22 12:41:44 zabbix kill: Usage:
May 22 12:41:44 zabbix kill: kill [options] <pid|name> [...]
May 22 12:41:44 zabbix systemd: zabbix-server.service: control process exited, code=exited status=1
May 22 12:41:44 zabbix kill: Options:
May 22 12:41:44 zabbix kill: -a, --all do not restrict the name-to-pid conversion to processes
May 22 12:41:44 zabbix kill: with the same uid as the present process
May 22 12:41:44 zabbix kill: -s, --signal <sig> send specified signal
May 22 12:41:44 zabbix kill: -q, --queue <sig> use sigqueue(2) rather than kill(2)
May 22 12:41:44 zabbix kill: -p, --pid print pids without signaling them
May 22 12:41:44 zabbix kill: -l, --list [=<signal>] list signal names, or convert one to a name
May 22 12:41:44 zabbix kill: -L, --table list signal names and numbers
May 22 12:41:44 zabbix kill: -h, --help display this help and exit
May 22 12:41:44 zabbix kill: -V, --version output version information and exit
May 22 12:41:44 zabbix kill: For more details see kill(1).
May 22 12:41:44 zabbix systemd: Unit zabbix-server.service entered failed state.
May 22 12:41:44 zabbix systemd: zabbix-server.service failed.
Мой заббикс сервер приказал долго жить и 5 раз упал на нож.
Забикс сервер упал после добавления очередного Cisco router'a в мониторинг. Последнее, что привлекло мое внимание, это количество items, trigers у последнего подключенного рутера. Оно зашкаливало 1-3к , и потом сообщение о том что служба сервера недоступна. http://prntscr.com/nrspkl
centos-release-7-6.1810.2.el7.centos.x86_64
sudo systemctl status zabbix-server
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2019-05-22 12:16:57 MSK; 9s ago
Process: 1877 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
Process: 1873 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
Main PID: 1875 (code=exited, status=1/FAILURE)
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -s, --signal <sig> send specified signal
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -q, --queue <sig> use sigqueue(2) rather than kill(2)
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -p, --pid print pids without signaling them
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -l, --list [=<signal>] list signal names, or convert one to a name
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -L, --table list signal names and numbers
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -h, --help display this help and exit
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -V, --version output version information and exit
May 22 12:16:57 zabbix.aquatep.local kill[1877]: For more details see kill(1).
May 22 12:16:57 zabbix.aquatep.local systemd[1]: Unit zabbix-server.service entered failed state.
May 22 12:16:57 zabbix.aquatep.local systemd[1]: zabbix-server.service failed.
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2019-05-22 12:16:57 MSK; 9s ago
Process: 1877 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
Process: 1873 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
Main PID: 1875 (code=exited, status=1/FAILURE)
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -s, --signal <sig> send specified signal
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -q, --queue <sig> use sigqueue(2) rather than kill(2)
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -p, --pid print pids without signaling them
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -l, --list [=<signal>] list signal names, or convert one to a name
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -L, --table list signal names and numbers
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -h, --help display this help and exit
May 22 12:16:57 zabbix.aquatep.local kill[1877]: -V, --version output version information and exit
May 22 12:16:57 zabbix.aquatep.local kill[1877]: For more details see kill(1).
May 22 12:16:57 zabbix.aquatep.local systemd[1]: Unit zabbix-server.service entered failed state.
May 22 12:16:57 zabbix.aquatep.local systemd[1]: zabbix-server.service failed.
/var/log/zabbix/zabbix_server.log
2668:20190522:123952.576 Starting Zabbix Server. Zabbix 4.2.1 (revision 92832).
2668:20190522:123952.576 ****** Enabled features ******
2668:20190522:123952.576 SNMP monitoring: YES
2668:20190522:123952.576 IPMI monitoring: YES
2668:20190522:123952.576 Web monitoring: YES
2668:20190522:123952.576 VMware monitoring: YES
2668:20190522:123952.577 SMTP authentication: YES
2668:20190522:123952.577 Jabber notifications: NO
2668:20190522:123952.577 Ez Texting notifications: YES
2668:20190522:123952.577 ODBC: YES
2668:20190522:123952.577 SSH2 support: YES
2668:20190522:123952.577 IPv6 support: YES
2668:20190522:123952.577 TLS support: YES
2668:20190522:123952.577 ******************************
2668:20190522:123952.577 using configuration file: /etc/zabbix/zabbix_server.conf
2668:20190522:123952.585 current database version (mandatory/optional): 04020000/04020000
2668:20190522:123952.585 required mandatory version: 04020000
2668:20190522:123953.467 __mem_malloc: skipped 6 asked 32056 skip_min 608 skip_max 9344
2668:20190522:123953.467 [file:dbconfig.c,line:94] __zbx_mem_realloc(): out of memory (requested 32056 bytes)
2668:20190522:123953.467 [file:dbconfig.c,line:94] __zbx_mem_realloc(): please increase CacheSize configuration parameter
2668:20190522:123953.467 === memory statistics for configuration cache ===
2668:20190522:123953.467 free chunks of size 24 bytes: 174
2668:20190522:123953.467 free chunks of size >= 256 bytes: 6
2668:20190522:123953.467 min chunk size: 24 bytes
2668:20190522:123953.467 max chunk size: 9344 bytes
2668:20190522:123953.467 memory of total size 8388232 bytes fragmented into 71906 chunks
2668:20190522:123953.467 of those, 38224 bytes are in 180 free chunks
2668:20190522:123953.467 of those, 7199528 bytes are in 71726 used chunks
2668:20190522:123953.467 ================================
2668:20190522:123953.467 === Backtrace: ===
2668:20190522:123953.468 13: /usr/sbin/zabbix_server(zbx_backtrace+0x42) [0x5571907fda16]
2668:20190522:123953.468 12: /usr/sbin/zabbix_server(__zbx_mem_realloc+0x169) [0x5571907f91e8]
2668:20190522:123953.468 11: /usr/sbin/zabbix_server(+0x15e02c) [0x5571907c302c]
2668:20190522:123953.468 10: /usr/sbin/zabbix_server(zbx_hashset_reserve+0xc1) [0x557190802bf8]
2668:20190522:123953.468 9: /usr/sbin/zabbix_server(zbx_hashset_insert_ext+0xee) [0x557190802e6f]
2668:20190522:123953.468 8: /usr/sbin/zabbix_server(zbx_hashset_insert+0x2d) [0x557190802d7f]
2668:20190522:123953.468 7: /usr/sbin/zabbix_server(DCfind_id+0x86) [0x5571907c399a]
2668:20190522:123953.468 6: /usr/sbin/zabbix_server(+0x165a43) [0x5571907caa43]
2668:20190522:123953.468 5: /usr/sbin/zabbix_server(DCsync_configuration+0xfcd) [0x5571907cf2c2]
2668:20190522:123953.468 4: /usr/sbin/zabbix_server(MAIN_ZABBIX_ENTRY+0x746) [0x5571906a498a]
2668:20190522:123953.468 3: /usr/sbin/zabbix_server(daemon_start+0x2f6) [0x5571907fd25a]
2668:20190522:123953.468 2: /usr/sbin/zabbix_server(main+0x312) [0x5571906a4242]
2668:20190522:123953.468 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fbd54ca83d5]
2668:20190522:123953.468 0: /usr/sbin/zabbix_server(+0x3e309) [0x5571906a3309]
2668:20190522:123952.576 Starting Zabbix Server. Zabbix 4.2.1 (revision 92832).
2668:20190522:123952.576 ****** Enabled features ******
2668:20190522:123952.576 SNMP monitoring: YES
2668:20190522:123952.576 IPMI monitoring: YES
2668:20190522:123952.576 Web monitoring: YES
2668:20190522:123952.576 VMware monitoring: YES
2668:20190522:123952.577 SMTP authentication: YES
2668:20190522:123952.577 Jabber notifications: NO
2668:20190522:123952.577 Ez Texting notifications: YES
2668:20190522:123952.577 ODBC: YES
2668:20190522:123952.577 SSH2 support: YES
2668:20190522:123952.577 IPv6 support: YES
2668:20190522:123952.577 TLS support: YES
2668:20190522:123952.577 ******************************
2668:20190522:123952.577 using configuration file: /etc/zabbix/zabbix_server.conf
2668:20190522:123952.585 current database version (mandatory/optional): 04020000/04020000
2668:20190522:123952.585 required mandatory version: 04020000
2668:20190522:123953.467 __mem_malloc: skipped 6 asked 32056 skip_min 608 skip_max 9344
2668:20190522:123953.467 [file:dbconfig.c,line:94] __zbx_mem_realloc(): out of memory (requested 32056 bytes)
2668:20190522:123953.467 [file:dbconfig.c,line:94] __zbx_mem_realloc(): please increase CacheSize configuration parameter
2668:20190522:123953.467 === memory statistics for configuration cache ===
2668:20190522:123953.467 free chunks of size 24 bytes: 174
2668:20190522:123953.467 free chunks of size >= 256 bytes: 6
2668:20190522:123953.467 min chunk size: 24 bytes
2668:20190522:123953.467 max chunk size: 9344 bytes
2668:20190522:123953.467 memory of total size 8388232 bytes fragmented into 71906 chunks
2668:20190522:123953.467 of those, 38224 bytes are in 180 free chunks
2668:20190522:123953.467 of those, 7199528 bytes are in 71726 used chunks
2668:20190522:123953.467 ================================
2668:20190522:123953.467 === Backtrace: ===
2668:20190522:123953.468 13: /usr/sbin/zabbix_server(zbx_backtrace+0x42) [0x5571907fda16]
2668:20190522:123953.468 12: /usr/sbin/zabbix_server(__zbx_mem_realloc+0x169) [0x5571907f91e8]
2668:20190522:123953.468 11: /usr/sbin/zabbix_server(+0x15e02c) [0x5571907c302c]
2668:20190522:123953.468 10: /usr/sbin/zabbix_server(zbx_hashset_reserve+0xc1) [0x557190802bf8]
2668:20190522:123953.468 9: /usr/sbin/zabbix_server(zbx_hashset_insert_ext+0xee) [0x557190802e6f]
2668:20190522:123953.468 8: /usr/sbin/zabbix_server(zbx_hashset_insert+0x2d) [0x557190802d7f]
2668:20190522:123953.468 7: /usr/sbin/zabbix_server(DCfind_id+0x86) [0x5571907c399a]
2668:20190522:123953.468 6: /usr/sbin/zabbix_server(+0x165a43) [0x5571907caa43]
2668:20190522:123953.468 5: /usr/sbin/zabbix_server(DCsync_configuration+0xfcd) [0x5571907cf2c2]
2668:20190522:123953.468 4: /usr/sbin/zabbix_server(MAIN_ZABBIX_ENTRY+0x746) [0x5571906a498a]
2668:20190522:123953.468 3: /usr/sbin/zabbix_server(daemon_start+0x2f6) [0x5571907fd25a]
2668:20190522:123953.468 2: /usr/sbin/zabbix_server(main+0x312) [0x5571906a4242]
2668:20190522:123953.468 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fbd54ca83d5]
2668:20190522:123953.468 0: /usr/sbin/zabbix_server(+0x3e309) [0x5571906a3309]
/var/log/messages
May 22 12:41:43 zabbix systemd: Starting Zabbix Server...
May 22 12:41:43 zabbix systemd: zabbix-server.service: Supervising process 2737 which is not our child. We'll most likely not notice when it exits.
May 22 12:41:43 zabbix systemd: Started Zabbix Server.
May 22 12:41:44 zabbix mysqld: 2019-05-22 12:41:44 103311 [Warning] Aborted connection 103311 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
May 22 12:41:44 zabbix systemd: zabbix-server.service: main process exited, code=exited, status=1/FAILURE
May 22 12:41:44 zabbix kill: Usage:
May 22 12:41:44 zabbix kill: kill [options] <pid|name> [...]
May 22 12:41:44 zabbix systemd: zabbix-server.service: control process exited, code=exited status=1
May 22 12:41:44 zabbix kill: Options:
May 22 12:41:44 zabbix kill: -a, --all do not restrict the name-to-pid conversion to processes
May 22 12:41:44 zabbix kill: with the same uid as the present process
May 22 12:41:44 zabbix kill: -s, --signal <sig> send specified signal
May 22 12:41:44 zabbix kill: -q, --queue <sig> use sigqueue(2) rather than kill(2)
May 22 12:41:44 zabbix kill: -p, --pid print pids without signaling them
May 22 12:41:44 zabbix kill: -l, --list [=<signal>] list signal names, or convert one to a name
May 22 12:41:44 zabbix kill: -L, --table list signal names and numbers
May 22 12:41:44 zabbix kill: -h, --help display this help and exit
May 22 12:41:44 zabbix kill: -V, --version output version information and exit
May 22 12:41:44 zabbix kill: For more details see kill(1).
May 22 12:41:44 zabbix systemd: Unit zabbix-server.service entered failed state.
May 22 12:41:44 zabbix systemd: zabbix-server.service failed.
Code:
ATOP - zabbix 2019/05/22 12:52:00 -------------- 10s elapsed
PRC | sys 0.04s | user 0.82s | | | #proc 125 | #trun 1 | | #tslpi 200 | #tslpu 0 | | #zombie 0 | clones 5 | | | no procacct |
CPU | sys 2% | user 9% | irq 0% | | | idle 389% | wait 0% | steal 0% | guest 0% | | ipc notavail | | cycl unknown | curf 2.67GHz | curscal ?% |
cpu | sys 1% | user 5% | irq 0% | | | idle 94% | cpu000 w 0% | steal 0% | guest 0% | | ipc notavail | | cycl unknown | curf 2.67GHz | curscal ?% |
cpu | sys 0% | user 3% | irq 0% | | | idle 97% | cpu001 w 0% | steal 0% | guest 0% | | ipc notavail | | cycl unknown | curf 2.67GHz | curscal ?% |
cpu | sys 0% | user 2% | irq 0% | | | idle 98% | cpu002 w 0% | steal 0% | guest 0% | | ipc notavail | | cycl unknown | curf 2.67GHz | curscal ?% |
cpu | sys 0% | user 0% | irq 0% | | | idle 100% | cpu003 w 0% | steal 0% | guest 0% | | ipc notavail | | cycl unknown | curf 2.67GHz | curscal ?% |
CPL | avg1 0.05 | | avg5 0.06 | | avg15 0.05 | | | csw 2989 | | intr 3333 | | | | numcpu 4 | |
MEM | tot 3.7G | free 290.5M | cache 1.1G | dirty 0.1M | buff 151.6M | slab 202.5M | slrec 166.7M | shmem 15.8M | shrss 0.4M | shswp 0.0M | | vmbal 0.0M | | hptot 0.0M | hpuse 0.0M |
SWP | tot 1.0G | free 996.6M | | | | | | | | | | | vmcom 3.2G | vmlim 2.9G | |
DSK | sda | busy 0% | | read 0 | write 6 | | KiB/r 0 | KiB/w 18 | | MBr/s 0.0 | MBw/s 0.0 | | avq 1.00 | avio 0.33 ms | |
NET | transport | tcpi 23 | tcpo 23 | | udpi 0 | udpo 0 | tcpao 1 | tcppo 0 | | tcprs 0 | tcpie 0 | tcpor 13 | udpnp 0 | | udpie 0 |
NET | network | ipi 28 | | ipo 23 | ipfrw 0 | | deliv 23 | | | | | | icmpi 0 | icmpo 0 | |
NET | ens160 0% | | pcki 43 | pcko 27 | | sp 10 Gbps | si 4 Kbps | so 3 Kbps | | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | | pcki 2 | pcko 2 | | sp 0 Mbps | si 0 Kbps | so 0 Kbps | | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
PID SYSCPU USRCPU VGROW RGROW RUID EUID ST EXC THR S CPUNR CPU CMD 1/1
4989 0.00s 0.56s 0K 0K mysql mysql -- - 61 S 2 6% mysqld
7733 0.00s 0.08s 0K 0K nginx nginx -- - 1 S 2 1% php-fpm
9759 0.01s 0.05s 0K 0K nginx nginx -- - 1 S 2 1% php-fpm
5078 0.00s 0.05s 0K 0K nginx nginx -- - 1 S 2 1% php-fpm
5430 0.00s 0.04s 0K 0K nginx nginx -- - 1 S 2 0% php-fpm
3057 0.02s 0.01s 0K 0K kmm kmm -- - 1 R 2 0% atop
2321 0.01s 0.00s 0K 16K root root -- - 1 S 0 0% systemd-journa
4513 0.00s 0.01s 0K 0K root root -- - 3 S 3 0% NetworkManager
4517 0.00s 0.01s 0K 0K root root -- - 1 S 2 0% vmtoolsd
3050 0.00s 0.01s 0K 0K root root -- - 1 S 1 0% kworker/1:1
4829 0.00s 0.00s 0K 12K root root -- - 3 S 1 0% rsyslogd
4823 0.00s 0.00s 0K 0K root root -- - 1 S 3 0% php-fpm
1 0.00s 0.00s 0K 0K root root -- - 1 S 1 0% systemd
5531 0.00s 0.00s 0K 0K kmm kmm -- - 1 S 3 0% sshd
4478 0.00s 0.00s 0K 0K dbus dbus -- - 2 S 2 0% dbus-daemon
4515 0.00s 0.00s 0K 0K root root -- - 1 S 2 0% irqbalance
4856 0.00s 0.00s 0K 0K zabbix zabbix -- - 1 S 1 0% zabbix_agentd
4449 0.00s 0.00s 0K 0K root root -- - 2 S 1 0% auditd
9 0.00s 0.00s 0K 0K root root -- - 1 S 3 0% rcu_sched
109 0.00s 0.00s 0K 0K root root -- - 1 S 3 0% kauditd
2249 0.00s 0.00s 0K 0K root root -- - 1 S 1 0% jbd2/sda2-8
Comment