Hi,
I am monitoring around 450 nodes, at starting everything was runnig fine.
But zabbix UI gradually (in two weeks time) became very slow.
I tried to restart zabbix server after changing some configuration params(increased trappers ans pollers) but zabbix took a long time to come up.
Also the zabbix API response also takes very long.
I have load tested zabbix for 2700 to 3000 nvps but i was generating load from 10 to 15 servers. Now nvps is around 700 but hosts are 450 and zabbix is not able to handle it.
Setup details:
Linux 2.6.32-358.2.1.el6
Data base : Postgresql
zabbix : Zabbix server v2.0.9 (revision 39085)
Both Db and server on same machine.
CPU cores : 24
Ram : 96 GB
**************************************
top output
top - 10:28:58 up 8 days, 23:03, 1 user, load average: 512.98, 607.09, 649.69
Tasks: 3440 total, 59 running, 3381 sleeping, 0 stopped, 0 zombie
Cpu(s): 93.8%us, 2.1%sy, 0.0%ni, 4.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 99022656k total, 97327092k used, 1695564k free, 426468k buffers
Swap: 20972848k total, 78320k used, 20894528k free, 67705676k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14744 postgres 20 0 10.0g 537m 530m S 2.9 0.6 22:08.43 postmaster
14973 postgres 20 0 10.0g 541m 532m S 2.7 0.6 21:57.05 postmaster
54173 admin 20 0 17692 3988 1004 R 2.7 0.0 0:00.52 top
14059 postgres 20 0 10.0g 540m 532m S 2.5 0.6 21:59.06 postmaster
14175 postgres 20 0 10.0g 539m 531m S 2.5 0.6 21:57.19 postmaster
14248 postgres 20 0 10.0g 540m 533m S 2.5 0.6 22:06.66 postmaster
14251 postgres 20 0 10.0g 309m 304m S 2.5 0.3 21:51.41 postmaster
14296 postgres 20 0 10.0g 321m 315m S 2.5 0.3 21:50.79 postmaster
14316 postgres 20 0 10.0g 539m 530m S 2.5 0.6 22:00.16 postmaster
14510 postgres 20 0 10.0g 537m 530m S 2.5 0.6 22:19.80 postmaster
**************************************
These are from zabbix_server.log on restart.
30302:20150120:083102.798 query [txnlev:0] [select alert_history,event_history,refresh_unsupported,di scovery_groupid,snmptrap_logging,severity_name_0,s everity_name_1,severity_name_2,severity_name_3,sev erity_name_4,severity_name_5 from config where 1=1 and configid between 0 and 99999999999999]
30302:20150120:083102.798 query [txnlev:0] [select i.itemid,i.hostid,h.proxy_hostid,i.type,i.data_typ e,i.value_type,i.key_,i.snmp_community,i.snmp_oid, i.port,i.snmpv3_securityname,i.snmpv3_securityleve l,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase, i.ipmi_sensor,i.delay,i.delay_flex,i.trapper_hosts ,i.logtimefmt,i.params,i.status,i.authtype,i.usern ame,i.password,i.publickey,i.privatekey,i.flags,i. interfaceid,i.lastclock from items i,hosts h where i.hostid=h.hostid and h.status in (0) and i.status in (0,3) and i.itemid between 0 and 99999999999999]
30302:20150120:083114.923 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.error,t.p riority,t.type,t.value,t.value_flags from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0) and i.status in (0,3) and t.status in (0) and t.flags not in (2) and h.hostid between 0 and 99999999999999]
Zabbix server runs some queries each time it restarts. And it takes very long for this query to execute.
30302:20150120:083114.923 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.error,t.p riority,t.type,t.value,t.value_flags from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0) and i.status in (0,3) and t.status in (0) and t.flags not in (2) and h.hostid between 0 and 99999999999999]
Why has the DB queries gone so slow? and how can i avoid such a sitaution.
I am monitoring around 450 nodes, at starting everything was runnig fine.
But zabbix UI gradually (in two weeks time) became very slow.
I tried to restart zabbix server after changing some configuration params(increased trappers ans pollers) but zabbix took a long time to come up.
Also the zabbix API response also takes very long.
I have load tested zabbix for 2700 to 3000 nvps but i was generating load from 10 to 15 servers. Now nvps is around 700 but hosts are 450 and zabbix is not able to handle it.
Setup details:
Linux 2.6.32-358.2.1.el6
Data base : Postgresql
zabbix : Zabbix server v2.0.9 (revision 39085)
Both Db and server on same machine.
CPU cores : 24
Ram : 96 GB
**************************************
top output
top - 10:28:58 up 8 days, 23:03, 1 user, load average: 512.98, 607.09, 649.69
Tasks: 3440 total, 59 running, 3381 sleeping, 0 stopped, 0 zombie
Cpu(s): 93.8%us, 2.1%sy, 0.0%ni, 4.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 99022656k total, 97327092k used, 1695564k free, 426468k buffers
Swap: 20972848k total, 78320k used, 20894528k free, 67705676k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14744 postgres 20 0 10.0g 537m 530m S 2.9 0.6 22:08.43 postmaster
14973 postgres 20 0 10.0g 541m 532m S 2.7 0.6 21:57.05 postmaster
54173 admin 20 0 17692 3988 1004 R 2.7 0.0 0:00.52 top
14059 postgres 20 0 10.0g 540m 532m S 2.5 0.6 21:59.06 postmaster
14175 postgres 20 0 10.0g 539m 531m S 2.5 0.6 21:57.19 postmaster
14248 postgres 20 0 10.0g 540m 533m S 2.5 0.6 22:06.66 postmaster
14251 postgres 20 0 10.0g 309m 304m S 2.5 0.3 21:51.41 postmaster
14296 postgres 20 0 10.0g 321m 315m S 2.5 0.3 21:50.79 postmaster
14316 postgres 20 0 10.0g 539m 530m S 2.5 0.6 22:00.16 postmaster
14510 postgres 20 0 10.0g 537m 530m S 2.5 0.6 22:19.80 postmaster
**************************************
These are from zabbix_server.log on restart.
30302:20150120:083102.798 query [txnlev:0] [select alert_history,event_history,refresh_unsupported,di scovery_groupid,snmptrap_logging,severity_name_0,s everity_name_1,severity_name_2,severity_name_3,sev erity_name_4,severity_name_5 from config where 1=1 and configid between 0 and 99999999999999]
30302:20150120:083102.798 query [txnlev:0] [select i.itemid,i.hostid,h.proxy_hostid,i.type,i.data_typ e,i.value_type,i.key_,i.snmp_community,i.snmp_oid, i.port,i.snmpv3_securityname,i.snmpv3_securityleve l,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase, i.ipmi_sensor,i.delay,i.delay_flex,i.trapper_hosts ,i.logtimefmt,i.params,i.status,i.authtype,i.usern ame,i.password,i.publickey,i.privatekey,i.flags,i. interfaceid,i.lastclock from items i,hosts h where i.hostid=h.hostid and h.status in (0) and i.status in (0,3) and i.itemid between 0 and 99999999999999]
30302:20150120:083114.923 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.error,t.p riority,t.type,t.value,t.value_flags from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0) and i.status in (0,3) and t.status in (0) and t.flags not in (2) and h.hostid between 0 and 99999999999999]
Zabbix server runs some queries each time it restarts. And it takes very long for this query to execute.
30302:20150120:083114.923 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.error,t.p riority,t.type,t.value,t.value_flags from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0) and i.status in (0,3) and t.status in (0) and t.flags not in (2) and h.hostid between 0 and 99999999999999]
Why has the DB queries gone so slow? and how can i avoid such a sitaution.


Comment