Ad Widget

Collapse

"падение черного ястреба" или помогите решить причину падения zabbix....

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zar
    Senior Member
    • Mar 2018
    • 148

    #1

    "падение черного ястреба" или помогите решить причину падения zabbix....

    Помогите, пожалуйста понять, что нужно сделать чтоб сервер жил...

    Дано:
    zabbix_server (Zabbix) 3.4.15
    Revision 86739 12 November 2018, compilation time: Nov 12 2018 11:04:06

    Compiled with OpenSSL 1.0.2g-fips 1 Mar 2016
    Running with OpenSSL 1.0.2g 1 Mar 2016
    Zabbix server is running Yes localhost:10051
    Number of hosts (enabled/disabled/templates) 173 68 / 11 / 94
    Number of items (enabled/disabled/not supported) 5674 4742 / 446 / 486
    Number of triggers (enabled/disabled [problem/ok]) 2413 2125 / 288 [10 / 2115]
    Number of users (online) 8 2
    Required server performance, new values per second 77.56
    База Mysql 26Gb

    Недавно в ночное время стало появляться предупреждение о
    Disk I/O is overloaded on Zabbix server
    Lack of free swap space on Zabbix server
    Zabbix housekeeper processes more than 75% busy
    и сегодня ночью сервер упалперед этим выдав
    Less than 25% free in the vmware cache
    просмотрев лог(скриншот) перед падением сервера понимаю, что ему не хватает памяти - расширил ее до 6Гб и добавил 12 ядер процу

    перезагрузил сервак, ситуация не поменялась.. сервер продолжает "подвисать, зависают окна в браузере и дашборд висит в попытки прогрузиться.



    В логах сервера вот такая каша:
    id=e.eventid where alerttype=0 and a.status=3 order by a.alertid"
    1869:20191211:144925.533 slow query: 23.970063 sec, "select itemid from items where type=7 and flags<>2 and hostid=10310"
    1826:20191211:144928.532 slow query: 9.344179 sec, "select gitemid,graphid,itemid,drawtype,sortorder,color,ya xisside,calc_fnc,type from graphs_items where (graphid between 3131 and 3160 or graphid in (2067,2069,2071,2152,2155,2173,2178,2187,2192,2209 ,2371,2382,2384,2582,2628,2685,2690,2709,2718,2733 ,2756,2758,2766,2768,2790,2817,2823,2824,2839,2846 ,2854,2872,2878,2911,2922,2924,2950,2958,2959,2960 ,2961,2969,2988,2992,2994,2997,3004,3024,3032,3036 ,3037,3044,3050,3065,3069,3072,3077,3080,3084,3085 ,3086,3090,3093,3094,3101))"
    1876:20191211:144929.839 cannot send list of active checks to "10.202.10.36": host [Windows host] not found
    1801:20191211:144930.780 slow query: 5.100850 sec, "select taskid,type,clock,ttl from task where status in (1,2) order by taskid"
    1801:20191211:144933.874 slow query: 3.076520 sec, "select taskid,type,clock,ttl from task where status in (1,2) order by taskid"
    1785:20191211:144936.883 slow query: 15.124773 sec, "select eventid,source,object,objectid,clock,value,acknowl edged,ns from events where eventid in (2164212,2468196,2473101,2473138,2474073) order by eventid"
    1826:20191211:144937.085 slow query: 4.097891 sec, "select h.hostid,h.host,h.name,h.status,hi.inventory_mode from hosts h,host_discovery hd left join host_inventory hi on hd.hostid=hi.hostid where h.hostid=hd.hostid and hd.parent_itemid=36289"
    1754:20191211:144938.870 slow query: 161.178092 sec, "select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.s tep from item_preproc pp,items i,hosts h where pp.itemid=i.itemid and i.hostid=h.hostid and h.status in (0,1) and i.flags<>2 order by pp.itemid"
    1760:20191211:144938.988 slow query: 14.703253 sec, "update httptest set nextcheck=1576047024 where httptestid=1"
    1857:20191211:144940.174 slow query: 17.133018 sec, "update hosts set disable_until=1576047022 where hostid=10271"
    1864:20191211:144941.754 slow query: 15.997810 sec, "update hosts set disable_until=1576047025 where hostid=10273"
    1786:20191211:144943.209 unmatched trap received from "10.202.55.5": 14:49:38 2019/12/11 .1.3.6.1.4.1.14179.2.6.3.41 Normal "General event" 10.202.55.5 - 10.202.55.5
    1786:20191211:144946.042 unmatched trap received from "10.202.55.5": 14:49:39 2019/12/11 .1.3.6.1.4.1.14179.2.6.3.41 Normal "General event" 10.202.55.5 - 10.202.55.5
    1812:20191211:144948.054 slow query: 6.337703 sec, "select id.itemid,id.key_,id.lastcheck,id.ts_delete,i.name ,i.key_,i.type,i.value_type,i.delay,i.history,i.tr ends,i.trapper_hosts,i.units,i.formula,i.logtimefm t,i.valuemapid,i.params,i.ipmi_sensor,i.snmp_commu nity,i.snmp_oid,i.port,i.snmpv3_securityname,i.snm pv3_securitylevel,i.snmpv3_authprotocol,i.snmpv3_a uthpassphrase,i.snmpv3_privprotocol,i.snmpv3_privp assphrase,i.authtype,i.username,i.password,i.publi ckey,i.privatekey,i.description,i.interfaceid,i.sn mpv3_contextname,i.jmx_endpoint,i.master_itemid,id .parent_itemid from item_discovery id join items i on id.itemid=i.itemid where id.parent_itemid between 36301 and 36305"
    1801:20191211:144949.847 slow query: 3.992923 sec, "select taskid,type,clock,ttl from task where status in (1,2) order by taskid"
    1872:20191211:144950.705 slow query: 3.087852 sec, "select hostid,status,tls_accept,tls_issuer,tls_subject,tl s_psk_identity from hosts where host='f02-srv-ais01.vas.arbitr.ru' and status in (0,1) and flags<>2 and proxy_hostid is null"
    1760:20191211:144952.076 slow query: 3.067970 sec, "select min(t.nextcheck) from httptest t,hosts h where t.hostid=h.hostid and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)"
    1867:20191211:144952.795 slow query: 6.749950 sec, "select u.userid,u.type from sessions s,users u where s.userid=u.userid and s.sessionid='d650300154a3a70d5422ace458f17306' and s.status=0"
    1883:20191211:144955.892 slow query: 3.056405 sec, "select hostid,status,tls_accept,tls_issuer,tls_subject,tl s_psk_identity from hosts where host='f02-srv-sql01.vas.arbitr.ru' and status in (0,1) and flags<>2 and proxy_hostid is null"
    1785:20191211:144956.361 slow query: 4.184484 sec, "select actionid from operations where recovery=1 and actionid=8"
    1850:20191211:144957.771 slow query: 5.698820 sec, "update hosts set disable_until=1576047051 where hostid=10274"
    1872:20191211:144958.915 slow query: 8.209498 sec, "select itemid from items where type=7 and flags<>2 and hostid=10309"
    1880:20191211:144959.003 slow query: 4.763322 sec, "select hostid,status,tls_accept,tls_issuer,tls_subject,tl s_psk_identity from hosts where host='f02-srv-video01.vas.arbitr.ru' and status in (0,1) and flags<>2 and proxy_hostid is null"


    ........


    1809:20191211:150344.279 slow query: 13.993236 sec, "select a.applicationid,a.name,ap.application_prototypeid, ad.lastcheck,ad.ts_delete,ad.name,ad.application_d iscoveryid from applications a,application_discovery ad,application_prototype ap where ap.itemid=28941 and ad.application_prototypeid=ap.application_prototyp eid and a.applicationid=ad.applicationid"
    1786:20191211:150344.285 slow query: 11.812831 sec, "update globalvars set snmp_lastsize=1398878"
    1781:20191211:150344.329 slow query: 61.548029 sec, "select distinct itemid from trends where clock>=1576044000 and itemid in (28539,33819,35786,35787,36320,37194,37239,38237,3 8292,39544,40191,40643,40899,41001,41247,41713,418 25,41865,41885,41890,41933,41938,42565,42584,42585 ,42620)"
    1869:20191211:150345.532 slow query: 12.386959 sec, "select itemid from items where type=7 and flags<>2 and hostid=10308"
    1801:20191211:150346.246 slow query: 10.461095 sec, "select taskid,type,clock,ttl from task where status in (1,2) order by taskid"
    1786:20191211:150346.827 unmatched trap received from "10.202.55.5": 15:03:31 2019/12/11 .1.3.6.1.4.1.14179.2.6.3.41 Normal "General event" 10.202.55.5 - 10.202.55.5
    1846:20191211:150346.897 slow query: 6.091234 sec, "update hosts set disable_until=1576047879 where hostid=10284"
    1786:20191211:150348.487 unmatched trap received from "10.202.55.5": 15:03:36 2019/12/11 .1.3.6.1.4.1.9.9.615.0.1 Normal "General event" 10.202.55.5 - 10.202.55.5
    1827:20191211:150350.107 slow query: 5.311869 sec, "select applicationid,itemid from items_applications where itemid in (28951,28952,28953,28954)"
    1760:20191211:150351.738 slow query: 12.296477 sec, "select h.hostid,h.host,h.name,t.httptestid,t.name,t.agent ,t.authentication,t.http_user,t.http_password,t.ht tp_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file, t.ssl_key_password,t.verify_peer,t.verify_host,t.d elay from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1576047819 and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)"
    1818:20191211:150352.436 slow query: 4.458497 sec, "select applicationid,itemid from items_applications where itemid in (28962,28963)"
    1814:20191211:150353.180 slow query: 9.431887 sec, "select ia.itemappid,ia.itemid,ia.applicationid from items_applications ia,item_discovery id1,item_discovery id2 where id1.itemid=ia.itemid and id1.parent_itemid=id2.itemid and id2.parent_itemid=28932"
    1786:20191211:150354.223 unmatched trap received from "10.202.55.5": 15:03:47 2019/12/11 .1.3.6.1.4.1.9.9.513.0.19 Normal "General event" 10.202.55.5 - 10.202.55.5
    1760:20191211:150354.803 slow query: 3.064961 sec, "select min(t.nextcheck) from httptest t,hosts h where t.hostid=h.hostid and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)"
    1876:20191211:150355.454 slow query: 4.420471 sec, "select itemid from items where type=7 and flags<>2 and hostid=10337"
    1890:20191211:150357.411 slow query: 3.114695 sec, "select a.alertid,a.mediatypeid,a.sendto,a.subject,a.messa ge,a.status,a.retries,e.source,e.object,e.objectid from alerts a left join events e on a.eventid=e.eventid where alerttype=0 and a.status=3 order by a.alertid"



    смотрим сколько zabbix'а запущено
    srv-mon01:/var/log/zabbix# ps ax | grep zabbix_
    1320 ? S 0:00 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
    1338 ? S 0:18 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
    1339 ? S 0:01 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
    1340 ? S 0:00 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
    1341 ? S 0:00 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
    1342 ? S 0:00 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
    1359 ? S 0:00 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
    1754 ? S 0:03 /usr/sbin/zabbix_server: configuration syncer [synced configuration in 2.169119 sec, idle 60 sec]
    1755 ? S 0:00 /usr/sbin/zabbix_server: alerter #1 started
    1756 ? S 0:00 /usr/sbin/zabbix_server: alerter #2 started
    1757 ? S 0:00 /usr/sbin/zabbix_server: alerter #3 started
    1758 ? S 0:00 /usr/sbin/zabbix_server: housekeeper [startup idle for 30 minutes]
    1759 ? S 0:03 /usr/sbin/zabbix_server: timer #1 [processed 234 triggers, 0 events in 0.049315 sec, 0 maintenances in 0.000000 sec, idle 30 sec]
    1760 ? S 0:00 /usr/sbin/zabbix_server: http poller #1 [got 0 values in 0.053712 sec, idle 5 sec]
    1761 ? S 0:01 /usr/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.026164 sec, idle 60 sec]
    1762 ? S 0:00 /usr/sbin/zabbix_server: discoverer #2 [processed 0 rules in 0.035724 sec, idle 60 sec]
    1763 ? S 0:03 /usr/sbin/zabbix_server: discoverer #3 [processed 0 rules in 0.004833 sec, idle 60 sec]
    1764 ? S 0:01 /usr/sbin/zabbix_server: discoverer #4 [processed 0 rules in 0.077338 sec, idle 60 sec]
    1765 ? S 0:05 /usr/sbin/zabbix_server: discoverer #5 [processed 0 rules in 0.023151 sec, idle 60 sec]
    1766 ? S 0:00 /usr/sbin/zabbix_server: discoverer #6 [processed 0 rules in 0.034309 sec, idle 60 sec]
    1767 ? S 0:01 /usr/sbin/zabbix_server: discoverer #7 [processed 0 rules in 0.024143 sec, idle 60 sec]
    1768 ? S 0:00 /usr/sbin/zabbix_server: discoverer #8 [processed 0 rules in 0.080169 sec, idle 60 sec]
    1769 ? S 0:01 /usr/sbin/zabbix_server: discoverer #9 [processed 0 rules in 0.041297 sec, idle 60 sec]
    1770 ? S 0:00 /usr/sbin/zabbix_server: discoverer #10 [processed 0 rules in 0.024875 sec, idle 60 sec]
    1771 ? S 0:00 /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.009645 sec, idle 60 sec]
    1772 ? S 0:00 /usr/sbin/zabbix_server: discoverer #12 [processed 0 rules in 0.057128 sec, idle 60 sec]
    1773 ? S 0:03 /usr/sbin/zabbix_server: discoverer #13 [processed 0 rules in 0.109798 sec, idle 60 sec]
    1774 ? S 0:01 /usr/sbin/zabbix_server: discoverer #14 [processed 0 rules in 0.091707 sec, idle 60 sec]
    1775 ? S 0:04 /usr/sbin/zabbix_server: discoverer #15 [processed 0 rules in 0.001232 sec, idle 60 sec]
    1776 ? S 0:07 /usr/sbin/zabbix_server: discoverer #16 [processed 0 rules in 0.145285 sec, idle 60 sec]
    1777 ? S 0:04 /usr/sbin/zabbix_server: discoverer #17 [processed 0 rules in 0.048327 sec, idle 60 sec]
    1778 ? S 0:02 /usr/sbin/zabbix_server: discoverer #18 [processed 0 rules in 0.022965 sec, idle 60 sec]
    1779 ? S 0:00 /usr/sbin/zabbix_server: discoverer #19 [processed 0 rules in 0.029167 sec, idle 60 sec]
    1780 ? S 0:00 /usr/sbin/zabbix_server: discoverer #20 [processed 0 rules in 0.050546 sec, idle 60 sec]
    1781 ? S 0:09 /usr/sbin/zabbix_server: history syncer #1 [synced 0 items in 0.000122 sec, idle 1 sec]
    1782 ? S 0:06 /usr/sbin/zabbix_server: history syncer #2 [synced 0 items in 0.000293 sec, idle 1 sec]
    1783 ? S 0:37 /usr/sbin/zabbix_server: history syncer #3 [synced 0 items in 0.010853 sec, idle 1 sec]
    1784 ? S 0:07 /usr/sbin/zabbix_server: history syncer #4 [synced 27 items in 0.546966 sec, idle 1 sec]
    1785 ? S 0:00 /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.212996 sec, idle 3 sec]
    1786 ? S 0:00 /usr/sbin/zabbix_server: snmp trapper [processed data in 0.004269 sec, idle 1 sec]
    1787 ? S 0:00 /usr/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000056 sec, idle 5 sec]
    1788 ? S 0:00 /usr/sbin/zabbix_server: self-monitoring [processed data in 0.000038 sec, idle 1 sec]
    1789 ? S 0:09 /usr/sbin/zabbix_server: vmware collector #1 [updated 0, removed 0 VMware services in 0.005780 sec, idle 5 sec]
    1790 ? S 0:01 /usr/sbin/zabbix_server: vmware collector #2 [updated 0, removed 0 VMware services in 0.000041 sec, idle 5 sec]
    1791 ? S 0:01 /usr/sbin/zabbix_server: vmware collector #3 [updated 0, removed 0 VMware services in 0.000045 sec, idle 5 sec]
    1792 ? S 2:21 /usr/sbin/zabbix_server: vmware collector #4 [updated 0, removed 0 VMware services in 0.012167 sec, querying VMware services]
    1794 ? S 0:05 /usr/sbin/zabbix_server: vmware collector #5 [updated 0, removed 0 VMware services in 0.000103 sec, idle 5 sec]
    1795 ? S 0:00 /usr/sbin/zabbix_server: vmware collector #6 [updated 0, removed 0 VMware services in 0.000053 sec, idle 5 sec]
    1797 ? S 2:59 /usr/sbin/zabbix_server: vmware collector #7 [updated 0, removed 0 VMware services in 0.001252 sec, querying VMware services]
    1798 ? S 0:02 /usr/sbin/zabbix_server: vmware collector #8 [updated 0, removed 0 VMware services in 0.000036 sec, idle 5 sec]
    1799 ? S 0:00 /usr/sbin/zabbix_server: vmware collector #9 [updated 0, removed 0 VMware services in 0.000058 sec, idle 5 sec]
    1800 ? S 0:00 /usr/sbin/zabbix_server: vmware collector #10 [updated 0, removed 0 VMware services in 0.000051 sec, idle 5 sec]
    1801 ? S 0:01 /usr/sbin/zabbix_server: task manager [processed 0 task(s) in 0.035452 sec, idle 5 sec]
    1802 ? S 0:07 /usr/sbin/zabbix_server: poller #1 [got 0 values in 0.012101 sec, idle 1 sec]
    1804 ? S 0:03 /usr/sbin/zabbix_server: poller #2 [got 0 values in 0.000043 sec, idle 1 sec]
    1806 ? S 0:03 /usr/sbin/zabbix_server: poller #3 [got 2 values in 0.031264 sec, idle 1 sec]
    1807 ? S 0:14 /usr/sbin/zabbix_server: poller #4 [got 1 values in 0.018328 sec, idle 1 sec]
    1808 ? S 0:11 /usr/sbin/zabbix_server: poller #5 [got 0 values in 0.005801 sec, idle 1 sec]
    1809 ? S 0:02 /usr/sbin/zabbix_server: poller #6 [got 2 values in 0.027768 sec, idle 1 sec]
    1810 ? S 0:02 /usr/sbin/zabbix_server: poller #7 [got 0 values in 0.000036 sec, idle 1 sec]
    1811 ? S 0:01 /usr/sbin/zabbix_server: poller #8 [got 0 values in 0.000029 sec, getting values]
    1812 ? S 0:01 /usr/sbin/zabbix_server: poller #9 [got 1 values in 0.020741 sec, idle 1 sec]
    1813 ? S 0:01 /usr/sbin/zabbix_server: poller #10 [got 3 values in 0.019037 sec, idle 1 sec]
    1814 ? S 0:22 /usr/sbin/zabbix_server: poller #11 [got 0 values in 0.008401 sec, idle 1 sec]
    1815 ? S 0:03 /usr/sbin/zabbix_server: poller #12 [got 2 values in 0.029754 sec, idle 1 sec]
    1816 ? S 0:04 /usr/sbin/zabbix_server: poller #13 [got 2 values in 0.025827 sec, idle 1 sec]
    1817 ? S 0:02 /usr/sbin/zabbix_server: poller #14 [got 1 values in 0.029920 sec, idle 1 sec]
    1818 ? S 0:02 /usr/sbin/zabbix_server: poller #15 [got 3 values in 0.032083 sec, idle 1 sec]
    1819 ? S 0:03 /usr/sbin/zabbix_server: poller #16 [got 4 values in 0.032440 sec, idle 1 sec]
    1820 ? S 0:07 /usr/sbin/zabbix_server: poller #17 [got 1 values in 0.030870 sec, idle 1 sec]
    1821 ? S 0:12 /usr/sbin/zabbix_server: poller #18 [got 0 values in 0.010127 sec, idle 1 sec]
    1822 ? S 0:03 /usr/sbin/zabbix_server: poller #19 [got 1 values in 0.024519 sec, idle 1 sec]
    1823 ? S 0:02 /usr/sbin/zabbix_server: poller #20 [got 1 values in 0.027564 sec, idle 1 sec]
    1824 ? S 0:21 /usr/sbin/zabbix_server: poller #21 [got 0 values in 0.002932 sec, idle 1 sec]
    1826 ? S 0:05 /usr/sbin/zabbix_server: poller #22 [got 1 values in 0.026027 sec, idle 1 sec]
    1827 ? S 0:01 /usr/sbin/zabbix_server: poller #23 [got 2 values in 0.030075 sec, idle 1 sec]
    1828 ? S 0:15 /usr/sbin/zabbix_server: poller #24 [got 0 values in 0.006184 sec, idle 1 sec]
    1829 ? S 0:02 /usr/sbin/zabbix_server: poller #25 [got 4 values in 0.026354 sec, idle 1 sec]
    1830 ? S 0:02 /usr/sbin/zabbix_server: poller #26 [got 1 values in 0.025162 sec, idle 1 sec]
    1831 ? S 0:35 /usr/sbin/zabbix_server: poller #27 [got 0 values in 0.020854 sec, idle 1 sec]
    1832 ? S 0:25 /usr/sbin/zabbix_server: poller #28 [got 0 values in 0.000056 sec, idle 1 sec]
    1833 ? S 0:29 /usr/sbin/zabbix_server: poller #29 [got 1 values in 0.033535 sec, idle 1 sec]
    1834 ? S 0:02 /usr/sbin/zabbix_server: poller #30 [got 2 values in 0.024044 sec, idle 1 sec]
    1835 ? S 0:05 /usr/sbin/zabbix_server: poller #31 [got 0 values in 0.000032 sec, idle 1 sec]
    1836 ? S 0:13 /usr/sbin/zabbix_server: poller #32 [got 12 values in 0.091709 sec, idle 1 sec]
    1837 ? S 0:14 /usr/sbin/zabbix_server: poller #33 [got 0 values in 0.001615 sec, idle 1 sec]
    1838 ? S 0:06 /usr/sbin/zabbix_server: poller #34 [got 2 values in 0.025320 sec, idle 1 sec]
    1839 ? S 0:18 /usr/sbin/zabbix_server: poller #35 [got 0 values in 0.000033 sec, idle 1 sec]
    1840 ? S 0:18 /usr/sbin/zabbix_server: poller #36 [got 1 values in 0.032859 sec, idle 1 sec]
    1841 ? S 0:43 /usr/sbin/zabbix_server: poller #37 [got 0 values in 0.001787 sec, idle 1 sec]
    1842 ? S 0:10 /usr/sbin/zabbix_server: poller #38 [got 0 values in 0.017529 sec, idle 1 sec]
    1843 ? S 0:13 /usr/sbin/zabbix_server: poller #39 [got 0 values in 0.007154 sec, idle 1 sec]
    1844 ? S 0:22 /usr/sbin/zabbix_server: poller #40 [got 1 values in 0.033286 sec, idle 1 sec]
    1845 ? S 0:10 /usr/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.004466 sec, idle 1 sec]
    1846 ? S 0:01 /usr/sbin/zabbix_server: unreachable poller #2 [got 0 values in 0.000088 sec, idle 1 sec]
    1847 ? S 0:01 /usr/sbin/zabbix_server: unreachable poller #3 [got 0 values in 0.000045 sec, idle 1 sec]
    1848 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #4 [got 0 values in 0.000030 sec, idle 1 sec]
    1849 ? S 0:01 /usr/sbin/zabbix_server: unreachable poller #5 [got 0 values in 0.000036 sec, idle 1 sec]
    1850 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #6 [got 0 values in 0.000055 sec, idle 1 sec]
    1851 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #7 [got 0 values in 0.000065 sec, idle 1 sec]
    1852 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #8 [got 0 values in 0.000034 sec, idle 1 sec]
    1853 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #9 [got 0 values in 0.000027 sec, idle 1 sec]
    1854 ? S 0:01 /usr/sbin/zabbix_server: unreachable poller #10 [got 0 values in 0.000051 sec, idle 1 sec]
    1855 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #11 [got 0 values in 0.000027 sec, idle 1 sec]
    1856 ? S 0:03 /usr/sbin/zabbix_server: unreachable poller #12 [got 0 values in 0.000105 sec, getting values]
    1857 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #13 [got 0 values in 0.000027 sec, idle 1 sec]
    1858 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #14 [got 0 values in 0.000062 sec, idle 1 sec]
    1859 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #15 [got 0 values in 0.000028 sec, idle 1 sec]
    1860 ? S 0:01 /usr/sbin/zabbix_server: unreachable poller #16 [got 0 values in 0.000110 sec, idle 1 sec]
    1861 ? S 0:01 /usr/sbin/zabbix_server: unreachable poller #17 [got 0 values in 0.000039 sec, idle 1 sec]
    1862 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #18 [got 0 values in 0.000251 sec, idle 1 sec]
    1863 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #19 [got 0 values in 0.000046 sec, idle 1 sec]
    1864 ? S 0:00 /usr/sbin/zabbix_server: unreachable poller #20 [got 0 values in 0.000065 sec, idle 1 sec]
    1865 ? S 0:00 /usr/sbin/zabbix_server: trapper #1 [processed data in 0.000444 sec, waiting for connection]
    1866 ? S 0:00 /usr/sbin/zabbix_server: trapper #2 [processed data in 0.017671 sec, waiting for connection]
    1867 ? S 0:00 /usr/sbin/zabbix_server: trapper #3 [processed data in 0.053383 sec, waiting for connection]
    1868 ? S 0:00 /usr/sbin/zabbix_server: trapper #4 [processed data in 0.128641 sec, waiting for connection]
    1869 ? S 0:00 /usr/sbin/zabbix_server: trapper #5 [processed data in 0.017973 sec, waiting for connection]
    1870 ? S 0:00 /usr/sbin/zabbix_server: trapper #6 [processed data in 0.027827 sec, waiting for connection]
    1871 ? S 0:00 /usr/sbin/zabbix_server: trapper #7 [processed data in 0.003525 sec, waiting for connection]
    1872 ? S 0:00 /usr/sbin/zabbix_server: trapper #8 [processed data in 0.000667 sec, waiting for connection]
    1873 ? S 0:00 /usr/sbin/zabbix_server: trapper #9 [processed data in 0.000182 sec, waiting for connection]
    1874 ? S 0:00 /usr/sbin/zabbix_server: trapper #10 [processed data in 0.080631 sec, waiting for connection]
    1875 ? S 0:00 /usr/sbin/zabbix_server: trapper #11 [processed data in 0.012570 sec, waiting for connection]
    1876 ? S 0:00 /usr/sbin/zabbix_server: trapper #12 [processed data in 0.001642 sec, waiting for connection]
    1877 ? S 0:00 /usr/sbin/zabbix_server: trapper #13 [processed data in 0.058541 sec, waiting for connection]
    1878 ? S 0:00 /usr/sbin/zabbix_server: trapper #14 [processed data in 0.000647 sec, waiting for connection]
    1879 ? S 0:01 /usr/sbin/zabbix_server: trapper #15 [processed data in 0.025092 sec, waiting for connection]
    1880 ? S 0:00 /usr/sbin/zabbix_server: trapper #16 [processed data in 0.052791 sec, waiting for connection]
    1881 ? S 0:00 /usr/sbin/zabbix_server: trapper #17 [processed data in 0.006843 sec, waiting for connection]
    1882 ? S 0:01 /usr/sbin/zabbix_server: trapper #18 [processed data in 0.072001 sec, waiting for connection]
    1883 ? S 0:00 /usr/sbin/zabbix_server: trapper #19 [processed data in 0.056112 sec, waiting for connection]
    1884 ? S 0:00 /usr/sbin/zabbix_server: trapper #20 [processed data in 0.029972 sec, waiting for connection]
    1885 ? S 0:05 /usr/sbin/zabbix_server: icmp pinger #1 [pinging hosts]
    1886 ? S 0:01 /usr/sbin/zabbix_server: icmp pinger #2 [got 0 values in 0.000050 sec, idle 1 sec]
    1887 ? S 0:01 /usr/sbin/zabbix_server: icmp pinger #3 [got 0 values in 0.000041 sec, idle 1 sec]
    1888 ? S 0:06 /usr/sbin/zabbix_server: icmp pinger #4 [pinging hosts]
    1889 ? S 0:03 /usr/sbin/zabbix_server: icmp pinger #5 [got 0 values in 0.001101 sec, idle 1 sec]
    1890 ? S 0:04 /usr/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.128618 sec during 5.141979 sec]
    1891 ? S 0:20 /usr/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 286 values, idle 5.323122 sec during 5.392197 sec]
    1892 ? S 0:01 /usr/sbin/zabbix_server: preprocessing worker #1 started
    1893 ? S 0:00 /usr/sbin/zabbix_server: preprocessing worker #2 started
    1894 ? S 0:00 /usr/sbin/zabbix_server: preprocessing worker #3 started
    3803 ? S 0:00 sh -c /usr/bin/fping -C3 -S10.202.100.150 2>&1 </tmp/zabbix_server_1885.pinger
    3808 ? S 0:00 sh -c /usr/bin/fping -C3 -S10.202.100.150 2>&1 </tmp/zabbix_server_1888.pinger
    3811 pts/1 S+ 0:00 grep --color=auto zabbix_
    Подскажите в какую сторону капать и куда смотреть... я понимаю что он невывозит колличество данных в которые в него поступают, но что сделать, чтоб он смог работать?
    Attached Files
  • Kos
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Aug 2015
    • 3404

    #2
    Я тут вижу несколько проблем, растущих от одного корня. Корень - поставили все компоненты в конфигурации по умолчанию и не следим за реальным использованием ресурсов.
    Следствия (в порядке убывания важности):
    • сервер падает по нехватке памяти, выделенной параметром конфига VMwareCacheSize, о чём говорит открытым текстом при падении. Увеличьте этот параметр перед тем как перезапустить процесс zabbix_server;
    • множество slow query свидетельствуют о плохой производительности СУБД. Скорее всего, надо тюнить сервер MySQL. Но тут я не спец, подробнее не подскажу. Как минимум, посмотреть его настройки: сколько ему выделено памяти под различные буферы.
    • большое количество серверных процессов разных видов, которые, по-видимому, взяты из конфигурации по умолчанию, но реально мало используются. Посмотреть на графики (из стандартного шаблона сервера Zabbix) "Zabbix data gathering process busy" и "Zabbix internal process busy", прикинуть реально необходимое количество серверных процессов различных типов, подкорректировать их в конфиг-файле сервера Zabbix и перезапустить его с новыми настройками.

    Comment

    • zar
      Senior Member
      • Mar 2018
      • 148

      #3
      про большое колличство item с самого утра занялся еще до того как написал тему в форуме
      после тотальной чистки осталось:
      было:
      Zabbix server is running Yes localhost:10051
      Number of hosts (enabled/disabled/templates) 173 68 / 11 / 94
      Number of items (enabled/disabled/not supported) 5674 4742 / 446 / 486
      Number of triggers (enabled/disabled [problem/ok]) 2413 2125 / 288 [10 / 2115]
      Number of users (online) 8 2
      Required server performance, new values per second 77.56
      стало:
      zabbix server is running Yes localhost:10051
      Number of hosts (enabled/disabled/templates) 164 59 / 11 / 94
      Number of items (enabled/disabled/not supported) 5520 4466 / 639 / 415
      Number of triggers (enabled/disabled [problem/ok]) 2492 2197 / 295 [12 / 2185]
      Number of users (online) 8 2
      Required server performance, new values per second 75.04
      Больше отключить ни чего не могу. это с учетом что еще разгрузил просто удалив два мощных темплейта с гипервизров.
      сейчас займусь ими...

      сервер хоть и перестал прям виснуть но не вернулся в то состояние которое было неделю назад.(нормально работал)
      сейчас подтормаживает слегка...

      большое количество серверных процессов разных видов, которые, по-видимому, взяты из конфигурации по умолчанию,
      Вот как раз процессы добавлял читая мануалы когда сервер zabbix начал сообщать что ему недостаточно и он не справляется.


      видимо у меня старая версия zabbix'а... нету таких ("Zabbix data gathering process busy" и "Zabbix internal process busy")
      это вот настройки после чистки сервера
      Zabbix vmware cache, % free 12/11/2019 06:04:49 PM 100 % Graph
      Zabbix value cache operating mode 12/11/2019 06:04:48 PM Normal (0) Graph
      Zabbix value cache misses 12/11/2019 06:04:48 PM 0.0656 vps +0.03 vps Graph
      Zabbix value cache hits 12/11/2019 06:04:46 PM 92.75 vps -4.89 vps Graph
      Zabbix value cache, % free 12/11/2019 06:04:45 PM 93.21 % -0.02 % Graph
      Zabbix trend write cache, % free 12/11/2019 06:04:52 PM 91.71 % Graph
      Zabbix queue over 10m 12/11/2019 06:01:47 PM 0 Graph
      Zabbix queue 12/11/2019 06:01:47 PM 995 +950 Graph
      Zabbix preprocessing queue 12/11/2019 06:01:18 PM 0 Graph
      Zabbix history write cache, % free 12/11/2019 06:04:50 PM 99.99 % -0.01 % Graph
      Zabbix history index cache, % free 12/11/2019 06:04:51 PM 99.38 % Graph
      Zabbix configuration cache, % free 12/11/2019 06:04:44 PM 99.83 % Graph
      Zabbix busy vmware collector processes, in % 12/11/2019 06:04:41 PM 4.33 % -4.21 % Graph
      Zabbix busy unreachable poller processes, in % 12/11/2019 06:04:40 PM 60.87 % +1.49 % Graph
      Zabbix busy trapper processes, in % 12/11/2019 06:04:39 PM 27.41 % +11.31 % Graph
      Zabbix busy timer processes, in % 12/11/2019 06:04:38 PM 54.85 % +19.08 % Graph
      Zabbix busy task manager processes, in % 12/11/2019 06:04:37 PM 51.54 % +12.81 % Graph
      Zabbix busy snmp trapper processes, in % 12/11/2019 06:05:36 PM 0.15 % -18.73 % Graph
      Zabbix busy self-monitoring processes, in % 12/11/2019 06:05:35 PM 0.7 % -26.86 % Graph
      Zabbix busy proxy poller processes, in % 12/11/2019 06:05:34 PM 0.03 % -66.66 % Graph
      Zabbix busy preprocessing worker processes, in % 12/11/2019 06:05:34 PM 0.03 % -0.25 % Graph
      Zabbix busy preprocessing manager processes, in % 12/11/2019 06:05:32 PM 1.85 % -13.5 % Graph
      Zabbix busy poller processes, in % 12/11/2019 06:05:31 PM 16 % -62.28 % Graph
      Zabbix busy java poller processes, in % Graph
      Zabbix busy ipmi poller processes, in % Graph
      Zabbix busy ipmi manager processes, in % Graph
      Zabbix busy icmp pinger processes, in % 12/11/2019 06:05:29 PM 26.76 % -57.6 % Graph
      Zabbix busy http poller processes, in % 12/11/2019 06:05:26 PM 5.28 % -62.17 % Graph
      Zabbix busy housekeeper processes, in % 12/11/2019 06:05:25 PM 0 % Graph
      Zabbix busy history syncer processes, in % 12/11/2019 06:05:24 PM 9.81 % -20.59 % Graph
      Zabbix busy escalator processes, in % 12/11/2019 06:05:23 PM 17.41 % -45.78 % Graph
      Zabbix busy discoverer processes, in % 12/11/2019 06:05:22 PM 0.09 % -0.51 % Graph
      Zabbix busy configuration syncer processes, in % 12/11/2019 06:05:21 PM 0.47 % +0.16 % Graph
      Zabbix busy alert manager processes, in % 12/11/2019 06:05:19 PM 21.54 % -39.01 % Graph
      Zabbix busy alerter processes, in % 12/11/2019 06:05:20 PM 0 % Graph
      Values processed by Zabbix server per second 12/11/2019 06:04:53 PM 144.92 +77.11 Graph

      slow query остался в логах... но уже меньше и не так часто...
      поправил mysql
      query_cache_size = выставил 0 т.к. прочитал что если много изменений идет нет смысла его делать кэш
      добавил
      innodb_buffer_pool_size = 1G
      innodb_log_file_size = 256M


      самое что непонятное "подтормаживает сам сервер... даже в ssh захожу и там рывками всё...
      графики vmware рисуют такое
      Click image for larger version

Name:	Screenshot from 2019-12-11 18-13-04.png
Views:	391
Size:	28.3 KB
ID:	391554Click image for larger version

Name:	Screenshot from 2019-12-11 18-13-51.png
Views:	384
Size:	83.3 KB
ID:	391555

      Attached Files

      Comment

      • zar
        Senior Member
        • Mar 2018
        • 148

        #4
        Сейчас стали появлятся ошибки вот такого плана... самое что странное, что я посути вернул все настройки до изменений в пятницу(добавил мониториг двух гипервизоров и ВМ в них
        Zabbix icmp pinger processes more than 75% busy 2m 52s No
        06:15:33 PM
        PROBLEM Zabbix server Zabbix poller processes more than 75% busy 2m 54s No

        Comment

        • zar
          Senior Member
          • Mar 2018
          • 148

          #5
          Click image for larger version  Name:	Screenshot from 2019-12-11 18-24-29.png Views:	0 Size:	968.9 KB ID:	391559
          Click image for larger version

Name:	image_15005.png
Views:	384
Size:	789.2 KB
ID:	391558
          Attached Files

          Comment

          • Kos
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Aug 2015
            • 3404

            #6
            видимо у меня старая версия zabbix'а... нету таких ("Zabbix data gathering process busy" и "Zabbix internal process busy")
            Версия 3.4.х - да, старая (уже давно неподдерживаемая). Но эти два вида графиков входят в штатный шаблон для сервера Zabbix, как минимум, с версии 2.0.
            Попробуйте найти (через Monitoring -> Graphs).

            Comment

            • zar
              Senior Member
              • Mar 2018
              • 148

              #7
              действительно, там они есть...
              Click image for larger version

Name:	Screenshot from 2019-12-11 22-24-31.png
Views:	398
Size:	109.5 KB
ID:	391576Click image for larger version

Name:	Screenshot from 2019-12-11 22-25-03.png
Views:	392
Size:	107.5 KB
ID:	391577

              только странно, последние изменения я делал в пятницу 6 декабря. и не вижу чтоб после этого нагрузка возрасла...
              Нагрузка слегка полезла вверх 10го числа и потом после перезапуска...

              сейчас почти выравнилась
              Click image for larger version

Name:	Screenshot from 2019-12-11 22-28-16.png
Views:	384
Size:	182.9 KB
ID:	391578Click image for larger version

Name:	Screenshot from 2019-12-11 22-28-37.png
Views:	381
Size:	177.1 KB
ID:	391579

              что это было не могу понять... нор еще не включал мониторинг гипервизоров.... хотя там в дискавери было отключено поиск машин. только мониторинг кластера и датастора...
              ни че не могу понять что случилось...

              Comment

              Working...