Ad Widget

Collapse

Zabbix web interface extremely slow :(

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jsosic
    Member
    • Apr 2008
    • 47

    #1

    Zabbix web interface extremely slow :(

    Hi! I installed Zabbix 1.6 from SVN (build 6204). Backend database is PostgreSQL. I have 2 dedicated machines for Zabbix, in a Red Hat Cluster setup. My database is on an external storage connected via FiberChannel (4gbps). PostgreSQL runs on one server on top of the XFS, and Zabbix/Apache/PHP on another. OS is CentOS 5.

    Now, my problem is that performance is horrible. Machines have 16GB of RAM, 2x quad core Opteron processors, and zabbix server is working OK. Load of the machines is very low (< 1), and still, every page I try to load takes forever. When I click and choose one host, web loads for ~ 5.5 seconds. It's a complete disaster for such an expensive setup I just can't figure out where the delay comes from Queries run on database like Select * from items are calculated very fast, I've tried ANALYZE, VACUUM FULL on database, and no significant gains. I'm really puzzled. I even moved database to local hard drives (10k rpm SCSI drives on adaptec controller with batter powered cache memory), and no gains. So the storage/FC/cables are no problem. I've tried GFS, and pages were loading sligthly faster (around 3.5 secs per page).

    What am I missing here?! I'm really desperate... Disk I/O is not the probelm. RAM is not the problem. CPU is not the problem. Then W T F IS THE PROBLEM?!
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Originally posted by jsosic
    What am I missing here?! I'm really desperate... Disk I/O is not the probelm. RAM is not the problem. CPU is not the problem. Then W T F IS THE PROBLEM?!
    I wouldn't expect any performance problems on this type of hardware. I doubt we or anyone else can help you without detailed analysis of your setup, configuration, etc etc.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • jsosic
      Member
      • Apr 2008
      • 47

      #3
      I'm wondering - is it possible that PostgreSQL is the main problem?! I know that Postgre isn't really designed for web applications, but since Zabbix 1.6 uses transactions, and does lots of writes, I tought that Postgre's scalability will come in place very handy. But if the cost of hardware load is slow loading of web pages, than this is not the trade off I'm prepared to do.

      The thing is, Zabbix Web users will be managers and CEO's and not just the technicians, so this thing has to show results as fast as possible. I'll try migrating the database to ext3, and some more stuff I haven't done yet, maybe even revert to MySQL for a tryout...

      Comment

      • Emir Imamagic
        Member
        • Mar 2008
        • 67

        #4
        Originally posted by Alexei
        I wouldn't expect any performance problems on this type of hardware. I doubt we or anyone else can help you without detailed analysis of your setup, configuration, etc etc.
        here are our stats:
        Number of hosts (monitored/not monitored/templates):692 (419 / 0 / 273)
        Number of items (monitored/disabled/not supported): 41147 (38509 / 1224 / 1414)
        Number of triggers (enabled/disabled)[true/unknown/false]: 4316 (4202 / 114 [121 / 608 / 3473])
        Number of users (online): 163
        Required server performance, new values per second: 230

        We define all hosts based on templates via generated XML and we have something like 5-7 templates per host.

        Furthermore we have 23 user/host groups which have different rw write permissions on different servers. However we didn't see any difference in web interface response for Admins and Super Admins.

        Also we tested only web interface (Zabbix server not running) versions 1.4 and 1.6. Preliminary tests show that version 1.4 is actually working faster. Is that expected? Would we actually gain on web interface performance by downgrading to 1.4?

        Comment

        • Emir Imamagic
          Member
          • Mar 2008
          • 67

          #5
          Ok, here goes more data. We switched on xdebug trace and put tar/gzipped output here.

          For the test we loaded Latest data page for a single machine with Group dropdown box set to "All". Trace shows that out of 5.36s, ~ 3.7s was spent on database query (these are heavy joins with sort) and rest on PHP script itself.

          Obviously delay doesn't come from some external system. As my colleague jsosic said these are pretty strong machines. Are we missing something here?

          Also, we're gonna try MySQL InnoDB backend and see if we can get some of these numbers down.

          Comment

          • Emir Imamagic
            Member
            • Mar 2008
            • 67

            #6
            Test with MySQL shows even worse figures (3 times slower in most of the cases). I put trace of the similar test ("Latest data") here.

            Comment

            • Aly
              ZABBIX developer
              • May 2007
              • 1126

              #7
              In my opinion this is strange. MySQL as far as I know work faster at least in long terms. Version 1.4 overall can't be faster then 1.6. In 1.6 were made algorithm and sql optimizations.
              Zabbix | ex GUI developer

              Comment

              • Emir Imamagic
                Member
                • Mar 2008
                • 67

                #8
                So your recommendation would be to go with MySQL? Could you please give us a few reasons why? Our investigation has shown that PostgreSQL has superior performance in general.

                Could you confirm that execution time (1,6s) of PHP scripts is expected on this hardware and Zabbix config?

                Comment

                • Emir Imamagic
                  Member
                  • Mar 2008
                  • 67

                  #9
                  Ok, one obvious drawback of PostgreSQL that I see right now is this bug:

                  Any idea when this might be solved?

                  Comment

                  • Emir Imamagic
                    Member
                    • Mar 2008
                    • 67

                    #10
                    Ok, we did some further investigation and tried to optimize some of the heavy queries. We managed to lower the execution of latest.php page from ~ 6s to ~1.5s. Below is the patch for the latest.php.

                    Code:
                    --- latest.php.orig     2008-10-23 16:12:09.000000000 +0200
                    +++ latest.php  2008-10-23 16:25:55.000000000 +0200
                    @@ -122,7 +122,7 @@
                            $available_groups= get_accessible_groups_by_user($USER_DETAILS,PERM_READ_LIST);
                            $available_hosts = get_accessible_hosts_by_user($USER_DETAILS,PERM_READ_LIST);
                    
                    -       $result=DBselect('SELECT DISTINCT g.groupid,g.name '.
                    +       /*$result=DBselect('SELECT DISTINCT g.groupid,g.name '.
                                                            ' FROM groups g, hosts_groups hg, hosts h, items i '.
                                                            ' WHERE '.DBcondition('g.groupid',$available_groups).
                                                                    ' AND hg.groupid=g.groupid '.
                    @@ -130,7 +130,14 @@
                                                                    ' AND h.hostid=i.hostid '.
                                                                    ' AND hg.hostid=h.hostid '.
                                                                    ' AND i.status='.ITEM_STATUS_ACTIVE.
                    -                                       ' ORDER BY g.name');
                    +                                       ' ORDER BY g.name');*/
                    +       $result=DBselect('SELECT DISTINCT g.groupid,g.name '.
                    +                                       ' FROM groups g INNER JOIN hosts_groups hg ON (hg.groupid=g.groupid)'.
                    +                                       ' LEFT JOIN hosts h ON (h.hostid = hg.hostid)'.
                    +                                       ' WHERE EXISTS (SELECT i.hostid FROM items i WHERE i.status='.ITEM_STATUS_ACTIVE.' AND i.hostid=h.hostid)'.
                    +                                               ' AND '.DBcondition('g.groupid',$available_groups).
                    +                                               ' AND h.status='.HOST_STATUS_MONITORED.
                    +                                        ' ORDER BY g.name');
                            while($row=DBfetch($result)){
                                    $cmbGroup->AddItem(
                                                    $row['groupid'],
                    @@ -145,13 +152,20 @@
                                    $sql_from .= ',hosts_groups hg ';
                                    $sql_where.= ' AND hg.hostid=h.hostid AND hg.groupid='.$_REQUEST['groupid'];
                            }
                    -       $sql='SELECT DISTINCT h.hostid,h.host '.
                    +       /*$sql='SELECT DISTINCT h.hostid,h.host '.
                                    ' FROM hosts h,items i '.$sql_from.
                                    ' WHERE h.status='.HOST_STATUS_MONITORED.
                                            ' AND h.hostid=i.hostid '.
                                            $sql_where.
                                            ' AND i.status='.ITEM_STATUS_ACTIVE.
                                            ' AND '.DBcondition('h.hostid',$available_hosts).
                    +               ' ORDER BY h.host';*/
                    +       $sql='SELECT DISTINCT h.hostid,h.host '.
                    +               ' FROM hosts h '.$sql_from.
                    +               ' WHERE h.status='.HOST_STATUS_MONITORED.
                    +                       $sql_where.
                    +                       ' AND '.DBcondition('h.hostid',$available_hosts).
                    +                       ' AND EXISTS (SELECT i.hostid FROM items i WHERE i.status='.ITEM_STATUS_ACTIVE.' AND i.hostid=h.hostid)'.
                                    ' ORDER BY h.host';
                            $result=DBselect($sql);
                            while($row=DBfetch($result)){

                    Comment

                    • Emir Imamagic
                      Member
                      • Mar 2008
                      • 67

                      #11
                      Here's outputs of explain analyze of original and modified version of the heaviest query just to show the difference.

                      Original query
                      Code:
                      > explain analyze SELECT DISTINCT g.groupid,g.name  FROM groups g, hosts_groups hg, hosts h, items i  WHERE  (g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31
                      ,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4))  AND hg.groupid=g.groupid  AND h.status=0 AND h.hostid=i.hostid  AND hg.hostid=h.hostid  AND i.status=0 ORDER BY g.name;
                                                                                                                      QUERY PLAN
                      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                       Unique  (cost=7505.11..7853.36 rows=28 width=16) (actual time=2115.244..2714.376 rows=39 loops=1)
                         ->  Sort  (cost=7505.11..7621.19 rows=46433 width=16) (actual time=2115.242..2668.727 rows=143229 loops=1)
                               Sort Key: g.name, g.groupid
                               Sort Method:  external merge  Disk: 4632kB
                               ->  Hash Join  (cost=105.66..3111.39 rows=46433 width=16) (actual time=3.724..262.536 rows=143229 loops=1)
                                     Hash Cond: (i.hostid = h.hostid)
                                     ->  Seq Scan on items i  (cost=0.00..2321.25 rows=46075 width=8) (actual time=0.015..174.678 rows=45760 loops=1)
                                           Filter: (status = 0)
                                     ->  Hash  (cost=97.13..97.13 rows=683 width=32) (actual time=3.700..3.700 rows=1527 loops=1)
                                           ->  Hash Join  (cost=37.11..97.13 rows=683 width=32) (actual time=0.625..3.024 rows=1527 loops=1)
                                                 Hash Cond: (hg.groupid = g.groupid)
                                                 ->  Hash Join  (cost=32.90..82.06 rows=1074 width=24) (actual time=0.552..2.052 rows=1527 loops=1)
                                                       Hash Cond: (hg.hostid = h.hostid)
                                                       ->  Seq Scan on hosts_groups hg  (cost=0.00..31.76 rows=1776 width=16) (actual time=0.012..0.499 rows=1785 loops=1)
                                                       ->  Hash  (cost=27.66..27.66 rows=419 width=8) (actual time=0.530..0.530 rows=421 loops=1)
                                                             ->  Seq Scan on hosts h  (cost=0.00..27.66 rows=419 width=8) (actual time=0.012..0.382 rows=421 loops=1)
                                                                   Filter: (status = 0)
                                                 ->  Hash  (cost=3.86..3.86 rows=28 width=16) (actual time=0.066..0.066 rows=44 loops=1)
                                                       ->  Seq Scan on groups g  (cost=0.00..3.86 rows=28 width=16) (actual time=0.025..0.051 rows=44 loops=1)
                                                             Filter: (groupid = ANY ('{14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4}'::bigint[]))
                       Total runtime: 2715.708 ms
                      Modified query
                      Code:
                      > explain analyze select DISTINCT g.groupid,g.name FROM groups g inner join hosts_groups hg on (hg.groupid=g.groupid) left join hosts h on (h.hostid = hg.hostid) where exists (select i.hostid from items i where i.status=0 and i.hostid=h.hostid) and g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4) AND h.status=0 ORDER BY g.name;
                                                                                                                QUERY PLAN
                      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                       Unique  (cost=1764.88..1767.44 rows=28 width=16) (actual time=12.781..13.526 rows=39 loops=1)
                         ->  Sort  (cost=1764.88..1765.73 rows=341 width=16) (actual time=12.779..13.055 rows=1527 loops=1)
                               Sort Key: g.name, g.groupid
                               Sort Method:  quicksort  Memory: 140kB
                               ->  Hash Join  (cost=81.62..1750.54 rows=341 width=16) (actual time=1.553..9.322 rows=1527 loops=1)
                                     Hash Cond: (hg.groupid = g.groupid)
                      
                                     ->  Hash Join  (cost=77.41..1740.91 rows=536 width=8) (actual time=1.475..8.353 rows=1527 loops=1)
                                           Hash Cond: (h.hostid = hg.hostid)
                                           ->  Bitmap Heap Scan on hosts h  (cost=23.45..1678.45 rows=209 width=8) (actual time=0.164..6.198 rows=421 loops=1)
                                                 Recheck Cond: (status = 0)
                                                 Filter: (subplan)
                                                 ->  Bitmap Index Scan on hosts_2  (cost=0.00..23.39 rows=419 width=0) (actual time=0.118..0.118 rows=423 loops=1)
                                                       Index Cond: (status = 0)
                                                 SubPlan
                                                   ->  Index Scan using items_1 on items i  (cost=0.00..315.26 rows=81 width=8) (actual time=0.013..0.013 rows=1 loops=421)
                                                         Index Cond: (hostid = $0)
                                                         Filter: (status = 0)
                                           ->  Hash  (cost=31.76..31.76 rows=1776 width=16) (actual time=1.304..1.304 rows=1785 loops=1)
                                                 ->  Seq Scan on hosts_groups hg  (cost=0.00..31.76 rows=1776 width=16) (actual time=0.012..0.659 rows=1785 loops=1)
                                     ->  Hash  (cost=3.86..3.86 rows=28 width=16) (actual time=0.070..0.070 rows=44 loops=1)
                                           ->  Seq Scan on groups g  (cost=0.00..3.86 rows=28 width=16) (actual time=0.029..0.055 rows=44 loops=1)
                                                 Filter: (groupid = ANY ('{14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4}'::bigint[]))
                       Total runtime: 13.606 ms

                      Comment

                      • Emir Imamagic
                        Member
                        • Mar 2008
                        • 67

                        #12
                        Query in the last post is used on multiple places (screens, tr_status, events, ...). I'm modifying queries on all of them. I can upload patches once I finish.

                        Comment

                        • Emir Imamagic
                          Member
                          • Mar 2008
                          • 67

                          #13
                          Concerning the compatibility, the modified query is standard SQL, there is nothing PostgreSQL-specific in it. Concerning the SQLite, based on the documentation provided here it seems that it can work fine with left joins and EXISTS expression. For the Oracle I'm pretty sure that it supports this expressions, but I don't have infrastructure to test it.

                          In addition, I just tested against MySQL and got following results:

                          Original query
                          Code:
                          > SELECT DISTINCT g.groupid,g.name  FROM groups g, hosts_groups hg, hosts h, items i  WHERE  (g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31
                              -> ,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4))  AND hg.groupid=g.groupid  AND h.status=0 AND h.hostid=i.hostid  AND hg.hostid=h.hostid  AND i.status=0 ORDER BY g.name;
                          ...
                          39 rows in set (1.28 sec)
                          Modified query
                          Code:
                          select DISTINCT g.groupid,g.name FROM groups g inner join hosts_groups hg on (hg.groupid=g.groupid) left join hosts h on (h.hostid = hg.hostid) where exists (select i.hostid from items i where i.status=0 and i.hostid=h.hostid) and g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4) AND h.status=0 ORDER BY g.name;
                          ...
                          39 rows in set (0.02 sec)

                          Comment

                          • Aly
                            ZABBIX developer
                            • May 2007
                            • 1126

                            #14
                            Originally posted by Emir Imamagic
                            Concerning the compatibility, the modified query is standard SQL, there is nothing PostgreSQL-specific in it. Concerning the SQLite, based on the documentation provided here it seems that it can work fine with left joins and EXISTS expression. For the Oracle I'm pretty sure that it supports this expressions, but I don't have infrastructure to test it.
                            Yes, I already checked this. All DB does support this statement.

                            Originally posted by Emir Imamagic
                            In addition, I just tested against MySQL and got following results:

                            Original query
                            Code:
                            > SELECT DISTINCT g.groupid,g.name  FROM groups g, hosts_groups hg, hosts h, items i  WHERE  (g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31
                                -> ,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4))  AND hg.groupid=g.groupid  AND h.status=0 AND h.hostid=i.hostid  AND hg.hostid=h.hostid  AND i.status=0 ORDER BY g.name;
                            ...
                            39 rows in set (1.28 sec)
                            Modified query
                            Code:
                            select DISTINCT g.groupid,g.name FROM groups g inner join hosts_groups hg on (hg.groupid=g.groupid) left join hosts h on (h.hostid = hg.hostid) where exists (select i.hostid from items i where i.status=0 and i.hostid=h.hostid) and g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4) AND h.status=0 ORDER BY g.name;
                            ...
                            39 rows in set (0.02 sec)
                            On explain there is no difference between results, but in my tests I see different result. More tests are needed.

                            P.S.
                            Code:
                            EXPLAIN
                            SELECT DISTINCT g.groupid,g.name
                            FROM groups g, hosts_groups hg, hosts h, items i
                            WHERE  (g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31
                            ,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4))
                              AND hg.groupid=g.groupid
                              AND h.status=0
                              AND h.hostid=i.hostid
                              AND hg.hostid=h.hostid
                              AND i.status=0
                            ORDER BY g.name;
                            Code:
                            1, 'SIMPLE', 'h', 'ref', 'PRIMARY,hosts_2', 'hosts_2', '4', 'const', 5126, 'Using index; Using temporary; Using filesort'
                            1, 'SIMPLE', 'hg', 'ref', 'hosts_groups_groups_1', 'hosts_groups_groups_1', '8', 'h.hostid', 2, 'Using where; Using index'
                            1, 'SIMPLE', 'g', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'hg.groupid', 1, ''
                            1, 'SIMPLE', 'i', 'ref', 'items_1,items_3', 'items_1', '8', 'hg.hostid', 1, 'Using where; Distinct'
                            Code:
                            EXPLAIN
                            SELECT DISTINCT g.groupid,g.name
                            FROM groups g
                              inner join hosts_groups hg on (hg.groupid=g.groupid)
                              left join hosts h on (h.hostid = hg.hostid)
                            where
                              exists (select i.hostid from items i where i.status=0 and i.hostid=h.hostid)
                              and g.groupid IN (14,15,16,17,18,19,20,21,22,9,6,5,23,24,25,26,30,29,28,27,8,7,31,32,33,34,13,11,12,10,36,37,42,2,35,38,40,39,43,1,41,44,3,4)
                              AND h.status=0
                            ORDER BY g.name;
                            Code:
                            1, 'PRIMARY', 'h', 'ref', 'PRIMARY,hosts_2', 'hosts_2', '4', 'const', 5126, 'Using where; Using index; Using temporary; Using filesort'
                            1, 'PRIMARY', 'hg', 'ref', 'hosts_groups_groups_1', 'hosts_groups_groups_1', '8', 'h.hostid', 2, 'Using where; Using index'
                            1, 'PRIMARY', 'g', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'hg.groupid', 1, ''
                            2, 'DEPENDENT SUBQUERY', 'i', 'ref', 'items_1,items_3', 'items_1', '8', 'h.hostid', 1, 'Using where'
                            Last edited by Aly; 24-10-2008, 10:21.
                            Zabbix | ex GUI developer

                            Comment

                            • Aly
                              ZABBIX developer
                              • May 2007
                              • 1126

                              #15
                              And another thing. In result we have 20-50 rows. But PGsql sorting is
                              Code:
                               ->  Sort  (cost=7505.11..7621.19 rows=46433 width=16) (actual time=2115.242..2668.727 rows=143229 loops=1)
                              This is something abnormal or pgsql specific in my opinion.
                              Zabbix | ex GUI developer

                              Comment

                              Working...