Ad Widget

Collapse

Zabbix 1.8.1 and PostgresSQL 8.1 slow with 10 hosts

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • eppie
    Junior Member
    • Jun 2009
    • 15

    #1

    Zabbix 1.8.1 and PostgresSQL 8.1 slow with 10 hosts

    Hello,

    I've a problem with our zabbix server. We first used Zabbix 1.6 on a machine which had a dying hd controller. Sow we decided to use an other machine and upgrade to 1.8.1.

    The machine is a CentOS 5.3 machine with a P4 3Ghz and 2 Gig of RAM. Because it only has to monitor around 15 hosts I would say that this machine must be fast enough. Als the interval for the items is about 5 minutes.

    But the machine is awfully slow! Loads vary from 1 to 10 caused by postgres. The web frontends dashboard doesn't even load! I've searched this forum and tried a few thinks but nothing helps.

    I've tried to optimize some postgres parameters. (Also turned on autovacuum).

    Is it possible to misses some indexes?
    Or what else can it be?

    Zabbix 1.8.1
    Postgres 8.1
    Software raid which capable of 80MB/sec


    Top:
    Code:
    top - 14:24:15 up 7 days, 45 min,  2 users,  load average: 7.94, 8.28, 6.32
    Tasks: 156 total,   7 running, 149 sleeping,   0 stopped,   0 zombie
    Cpu(s): 91.4%us,  8.1%sy,  0.0%ni,  0.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   2075560k total,  1928992k used,   146568k free,   193984k buffers
    Swap:  2096376k total,      144k used,  2096232k free,  1479164k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
    18373 postgres  18   0 22176  11m   9m R 74.4  0.6   1:23.34 postmaster                                                                                      
    18397 postgres  17   0 22752  11m  10m R 72.4  0.6   2:14.27 postmaster                                                                                      
    19305 postgres  18   0 23252  11m   9m R 49.8  0.6   0:32.31 postmaster                                                                                      
    18376 postgres  16   0 22520  11m  10m S  1.0  0.6   0:09.67 postmaster                                                                                      
    19358 root      15   0  2324 1048  796 R  0.7  0.1   0:00.05 top                                                                                             
      404 root      10  -5     0    0    0 S  0.3  0.0   0:23.17 md1_raid1
    After check some queries it looks like the problem lays with the item table. The count for the items is relative to the others very slow.

    Code:
    zabbix=> select count(*) from items;
      2044
    
    Time: 155.076 ms
    zabbix=> select count(*) from history;
     2310746
    
    Time: 1632.498 ms
    zabbix=> select count(*) from history_uint;
     5012448
    
    Time: 3421.886 ms
    Explain plan
    Code:
    zabbix=> explain analyze SELECT DISTINCT g.* FROM groups g,hosts_groups hg,hosts h WHERE ((g.groupid  BETWEEN 000000000000000 AND 099999999999999)) AND hg.groupid=g.groupid AND h.hostid=hg.hostid AND h.status=0 AND EXISTS( SELECT t.triggerid  FROM items i, functions f, triggers t WHERE i.hostid=hg.hostid  AND i.status=0 AND i.itemid=f.itemid  AND f.triggerid=t.triggerid  AND t.status=0);
                                                                                               QUERY PLAN                                                                                            
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Unique  (cost=2668.39..2668.41 rows=1 width=158) (actual time=19601.806..19601.852 rows=6 loops=1)
       ->  Sort  (cost=2668.39..2668.40 rows=1 width=158) (actual time=19601.800..19601.820 rows=10 loops=1)
             Sort Key: g.groupid, g.name, g.internal
             ->  Nested Loop  (cost=0.00..2668.38 rows=1 width=158) (actual time=373.403..19601.703 rows=10 loops=1)
                   Join Filter: ("inner".hostid = "outer".hostid)
                   ->  Nested Loop  (cost=0.00..2665.88 rows=1 width=166) (actual time=373.320..19598.309 rows=48 loops=1)
                         ->  Seq Scan on groups g  (cost=0.00..1.18 rows=1 width=158) (actual time=0.013..0.053 rows=12 loops=1)
                               Filter: ((groupid >= 0) AND (groupid <= 99999999999999::bigint))
                         ->  Index Scan using hosts_groups_2 on hosts_groups hg  (cost=0.00..2664.69 rows=1 width=16) (actual time=317.854..1633.172 rows=4 loops=12)
                               Index Cond: (hg.groupid = "outer".groupid)
                               Filter: (subplan)
                               SubPlan
                                 ->  Nested Loop  (cost=170.49..2660.01 rows=1 width=8) (actual time=369.762..369.762 rows=1 loops=53)
                                       ->  Hash Join  (cost=170.49..2647.99 rows=2 width=8) (actual time=369.666..369.726 rows=1 loops=53)
                                             Hash Cond: ("outer".itemid = "inner".itemid)
                                             ->  Seq Scan on functions f  (cost=0.00..1994.32 rows=96632 width=16) (actual time=0.005..1.039 rows=453 loops=52)
                                             ->  Hash  (cost=170.44..170.44 rows=23 width=8) (actual time=368.193..368.193 rows=40 loops=53)
                                                   ->  Bitmap Heap Scan on items i  (cost=79.94..170.44 rows=23 width=8) (actual time=367.894..368.040 rows=40 loops=53)
                                                         Recheck Cond: ((status = 0) AND (hostid = $0))
                                                         ->  BitmapAnd  (cost=79.94..79.94 rows=23 width=0) (actual time=367.618..367.618 rows=0 loops=53)
                                                               ->  Bitmap Index Scan on items_status_index  (cost=0.00..29.84 rows=4527 width=0) (actual time=359.337..359.337 rows=796458 loops=53)
                                                                     Index Cond: (status = 0)
                                                               ->  Bitmap Index Scan on items_1  (cost=0.00..49.84 rows=4527 width=0) (actual time=1.715..1.715 rows=122 loops=53)
                                                                     Index Cond: (hostid = $0)
                                       ->  Index Scan using triggers_pkey on triggers t  (cost=0.00..6.00 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=53)
                                             Index Cond: ("outer".triggerid = t.triggerid)
                                             Filter: (status = 0)
                   ->  Seq Scan on hosts h  (cost=0.00..2.49 rows=1 width=8) (actual time=0.017..0.046 rows=11 loops=48)
                         Filter: (status = 0)
     Total runtime: 19602.220 ms
    (30 rows)

    After running analyse on the database the problem looks gone. Anyone any idea how that can happen?
    Last edited by eppie; 06-04-2010, 15:42.
Working...