Ad Widget

**eppie** · 06-04-2010, 13:03

Hello,

I've a problem with our zabbix server. We first used Zabbix 1.6 on a machine which had a dying hd controller. Sow we decided to use an other machine and upgrade to 1.8.1.

The machine is a CentOS 5.3 machine with a P4 3Ghz and 2 Gig of RAM. Because it only has to monitor around 15 hosts I would say that this machine must be fast enough. Als the interval for the items is about 5 minutes.

But the machine is awfully slow! Loads vary from 1 to 10 caused by postgres. The web frontends dashboard doesn't even load! I've searched this forum and tried a few thinks but nothing helps.

I've tried to optimize some postgres parameters. (Also turned on autovacuum).

Is it possible to misses some indexes?
Or what else can it be?

Zabbix 1.8.1
Postgres 8.1
Software raid which capable of 80MB/sec

Top:

Code:

top - 14:24:15 up 7 days, 45 min,  2 users,  load average: 7.94, 8.28, 6.32
Tasks: 156 total,   7 running, 149 sleeping,   0 stopped,   0 zombie
Cpu(s): 91.4%us,  8.1%sy,  0.0%ni,  0.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2075560k total,  1928992k used,   146568k free,   193984k buffers
Swap:  2096376k total,      144k used,  2096232k free,  1479164k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
18373 postgres  18   0 22176  11m   9m R 74.4  0.6   1:23.34 postmaster                                                                                      
18397 postgres  17   0 22752  11m  10m R 72.4  0.6   2:14.27 postmaster                                                                                      
19305 postgres  18   0 23252  11m   9m R 49.8  0.6   0:32.31 postmaster                                                                                      
18376 postgres  16   0 22520  11m  10m S  1.0  0.6   0:09.67 postmaster                                                                                      
19358 root      15   0  2324 1048  796 R  0.7  0.1   0:00.05 top                                                                                             
  404 root      10  -5     0    0    0 S  0.3  0.0   0:23.17 md1_raid1

After check some queries it looks like the problem lays with the item table. The count for the items is relative to the others very slow.

Code:

zabbix=> select count(*) from items;
  2044

Time: 155.076 ms
zabbix=> select count(*) from history;
 2310746

Time: 1632.498 ms
zabbix=> select count(*) from history_uint;
 5012448

Time: 3421.886 ms

Explain plan

Code:

zabbix=> explain analyze SELECT DISTINCT g.* FROM groups g,hosts_groups hg,hosts h WHERE ((g.groupid  BETWEEN 000000000000000 AND 099999999999999)) AND hg.groupid=g.groupid AND h.hostid=hg.hostid AND h.status=0 AND EXISTS( SELECT t.triggerid  FROM items i, functions f, triggers t WHERE i.hostid=hg.hostid  AND i.status=0 AND i.itemid=f.itemid  AND f.triggerid=t.triggerid  AND t.status=0);
                                                                                           QUERY PLAN                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2668.39..2668.41 rows=1 width=158) (actual time=19601.806..19601.852 rows=6 loops=1)
   ->  Sort  (cost=2668.39..2668.40 rows=1 width=158) (actual time=19601.800..19601.820 rows=10 loops=1)
         Sort Key: g.groupid, g.name, g.internal
         ->  Nested Loop  (cost=0.00..2668.38 rows=1 width=158) (actual time=373.403..19601.703 rows=10 loops=1)
               Join Filter: ("inner".hostid = "outer".hostid)
               ->  Nested Loop  (cost=0.00..2665.88 rows=1 width=166) (actual time=373.320..19598.309 rows=48 loops=1)
                     ->  Seq Scan on groups g  (cost=0.00..1.18 rows=1 width=158) (actual time=0.013..0.053 rows=12 loops=1)
                           Filter: ((groupid >= 0) AND (groupid <= 99999999999999::bigint))
                     ->  Index Scan using hosts_groups_2 on hosts_groups hg  (cost=0.00..2664.69 rows=1 width=16) (actual time=317.854..1633.172 rows=4 loops=12)
                           Index Cond: (hg.groupid = "outer".groupid)
                           Filter: (subplan)
                           SubPlan
                             ->  Nested Loop  (cost=170.49..2660.01 rows=1 width=8) (actual time=369.762..369.762 rows=1 loops=53)
                                   ->  Hash Join  (cost=170.49..2647.99 rows=2 width=8) (actual time=369.666..369.726 rows=1 loops=53)
                                         Hash Cond: ("outer".itemid = "inner".itemid)
                                         ->  Seq Scan on functions f  (cost=0.00..1994.32 rows=96632 width=16) (actual time=0.005..1.039 rows=453 loops=52)
                                         ->  Hash  (cost=170.44..170.44 rows=23 width=8) (actual time=368.193..368.193 rows=40 loops=53)
                                               ->  Bitmap Heap Scan on items i  (cost=79.94..170.44 rows=23 width=8) (actual time=367.894..368.040 rows=40 loops=53)
                                                     Recheck Cond: ((status = 0) AND (hostid = $0))
                                                     ->  BitmapAnd  (cost=79.94..79.94 rows=23 width=0) (actual time=367.618..367.618 rows=0 loops=53)
                                                           ->  Bitmap Index Scan on items_status_index  (cost=0.00..29.84 rows=4527 width=0) (actual time=359.337..359.337 rows=796458 loops=53)
                                                                 Index Cond: (status = 0)
                                                           ->  Bitmap Index Scan on items_1  (cost=0.00..49.84 rows=4527 width=0) (actual time=1.715..1.715 rows=122 loops=53)
                                                                 Index Cond: (hostid = $0)
                                   ->  Index Scan using triggers_pkey on triggers t  (cost=0.00..6.00 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=53)
                                         Index Cond: ("outer".triggerid = t.triggerid)
                                         Filter: (status = 0)
               ->  Seq Scan on hosts h  (cost=0.00..2.49 rows=1 width=8) (actual time=0.017..0.046 rows=11 loops=48)
                     Filter: (status = 0)
 Total runtime: 19602.220 ms
(30 rows)

After running analyse on the database the problem looks gone. Anyone any idea how that can happen?

Ad Widget

Zabbix 1.8.1 and PostgresSQL 8.1 slow with 10 hosts

Zabbix 1.8.1 and PostgresSQL 8.1 slow with 10 hosts