Ad Widget

Collapse

Slow page graphs

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • perun.84
    Member
    • May 2016
    • 73

    #1

    Slow page graphs

    Hello, I have zabbix with 3 proxies and with about 3300 devices. It is network devices mostly. When I try to open Monitoring->Graphs page it takes over 15 seconds. How can I make it faster?

    Thanks
  • kloczek
    Senior Member
    • Jun 2006
    • 1771

    #2
    Originally posted by perun.84
    Hello, I have zabbix with 3 proxies and with about 3300 devices. It is network devices mostly. When I try to open Monitoring->Graphs page it takes over 15 seconds. How can I make it faster?
    First: number of monitored devices is not relevant.
    In dashboard you have line with "Required server performance, new values per second"
    You may have 3k devices with one metric to monitor on each dev and one dev wit 3k metrics and as long as sampling rate of those metrics will be thae same et the end impact on DB backednd will be almost the same.

    Generally all what you need to do is improve speed of your DB backed.

    For me large scale zabbix monitoring starts from about +1-1.5k nvps (new values per second) and about +2k selects/s on DB backend.
    Anything below it is more like mid size monitored env.

    Try to check IO stats on your DB backend. If you see that you have more read IOs than write IOs it means that your DB backednd is slowed down by lack of ability to cache enough MFU/MRU data in RAM and instead serving selects data from cached dada in RAM your system is forced accessing to storage (even NVMe storage read latency is ~100 longer than reading from RAM).
    On my blog (link is below) you can find some examples how it should be looking IO stats from DB backed well tuned from point of view of avalible for caches RAM.

    Again: key factor to have enough fast DB backend is enough RAM. Everything else can improve somehow overall speed but not as much as enough RAM size.
    Of course from a scale ~200-300 nvps DB backed should be used with partitioned history and trends tables.
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment

    • perun.84
      Member
      • May 2016
      • 73

      #3
      My zabbix has about 4500 nvps. My problem isn't in graphs, it is related to opening graph page. I think that filling combo-boxes takes so long... Everything else is fine for now.
      Few days ago I couldn't open graph page. In log I saw that php wants more then 512M of RAM-a, and when I increase max_mem value I can open page but it is very slow...

      Thanks for your time.
      Last edited by perun.84; 25-08-2016, 21:38.

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        Originally posted by perun.84
        My zabbix has about 4500 nvps. My problem isn't in graphs, it is related to opening graph page. I think that filling combo-boxes takes so long... Everything else is fine for now. Thanks for your time.
        On generating such content important is avg latency of selects. Those selects will be handled with lowest possible latency if all data returned by selects will be possible to serve using data cached in memory.
        Part of overall latency may be generated of web server side. You can improve this part by add zenoptcache (http://pecl.php.net/package/zendopcache/) or switching to php 7 (even better).

        Simple you need t have look closer on the web server(s) stats or DB stays to identify where in your case is bottleneck.
        If you have some problems with monitoring data interpretation on identify source of your problems you need to present some monitoring data.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • perun.84
          Member
          • May 2016
          • 73

          #5
          I've just upgraded php to version 7. It is similar situation. I gonna try with opcache... In mysql slow logfile I cannot see any log while I'm trying to open graph page.

          Comment

          • kloczek
            Senior Member
            • Jun 2006
            • 1771

            #6
            Originally posted by perun.84
            I've just upgraded php to version 7. It is similar situation. I gonna try with opcache... In mysql slow logfile I cannot see any log while I'm trying to open graph page.
            Current zenoptcache does not compile with php 7.
            Again: did you had a look on ratio read to write IOs on DB backend?
            If you have more read thar write IOs it means that you need more memory.

            Other solution is migrate DB backedn to Solaris and use gzip-1 compression.
            gzip-1 should give you +4 compression ratio and as long as ZFS ARC keeps cached records in memory in compressed form it causes that your N GB of RAM is working like memory multiplied by compression ratio.
            http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
            https://kloczek.wordpress.com/
            zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
            My zabbix templates https://github.com/kloczek/zabbix-templates

            Comment

            • perun.84
              Member
              • May 2016
              • 73

              #7
              There are more writes then reads. Loading graph page takes 30-32 seconds...

              One (maybe) important notice. If I change group combo box from some group with few hosts to "all" loading also takes over 20 seconds...
              Last edited by perun.84; 26-08-2016, 00:07.

              Comment

              • perun.84
                Member
                • May 2016
                • 73

                #8
                I have 3100 devices in one group. Maybe it could be a problem...

                Comment

                • kloczek
                  Senior Member
                  • Jun 2006
                  • 1771

                  #9
                  Try:
                  Code:
                  while false; do mysqladmin processlist | grep -v Sleep; sleep 1; done
                  on DB server side and observe which one select query stays so long active.
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment

                  • perun.84
                    Member
                    • May 2016
                    • 73

                    #10
                    Nice command, man. Did you mean while true?

                    Comment

                    • perun.84
                      Member
                      • May 2016
                      • 73

                      #11
                      Maybe deleting old trends and trends_uint partitions would help... What do you think?

                      Comment

                      • kloczek
                        Senior Member
                        • Jun 2006
                        • 1771

                        #12
                        In case of engineering there are many thinks which you can check, then after this you can try to think.
                        I have no idea what is going on on your db backend (because so far you not presented any diagnostics or onitoring data) so really I'm not going on suggest anything, however I'm almost 100% sure that old trends data are not even touched on presenting your slow-page over web frontend.
                        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                        https://kloczek.wordpress.com/
                        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                        My zabbix templates https://github.com/kloczek/zabbix-templates

                        Comment

                        • perun.84
                          Member
                          • May 2016
                          • 73

                          #13
                          Here is a mysql ban images:

                          Mysql operations:



                          Mysql bandwidth:



                          Disk I/O:



                          I'm not sure about I/O. I 've imported template for that last night. It isn't logical for me...
                          Last edited by perun.84; 26-08-2016, 15:27.

                          Comment

                          • kloczek
                            Senior Member
                            • Jun 2006
                            • 1771

                            #14
                            Whatever graphs you are going to show please show in 1d scale.
                            Did you try to check active longer than few sec running queries during refresh your slow graph page?
                            How much memory has this system? How much is dedicated to innodb pool?
                            Can you show base informations from dashboard like number of hosts/items/triggers/nvps?
                            http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                            https://kloczek.wordpress.com/
                            zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                            My zabbix templates https://github.com/kloczek/zabbix-templates

                            Comment

                            • perun.84
                              Member
                              • May 2016
                              • 73

                              #15
                              Here is free -m output on db server:

                              Code:
                              [root@zabbixdb ~]# free -m
                                            total        used        free      shared  buff/cache   available
                              Mem:          11855        7175        2551          40        2128        4413
                              Swap:          2047           0        2047

                              my.cnf is here:

                              Code:
                              [mysqld]
                              max_connections = 600
                              event_scheduler=ON
                              symbolic-links=TRUE
                              
                              interactive_timeout=180
                              wait_timeout=180
                              
                              query_cache_type=0
                              query_cache_size=0
                              
                              
                              join_buffer_size=4M
                              innodb_buffer_pool_size=6G
                              innodb_buffer_pool_instances=8
                              innodb_flush_method=O_DIRECT
                              innodb_file_per_table=1
                              innodb_old_blocks_time=1000
                              thread_cache_size=8
                              innodb_flush_log_at_trx_commit=0
                              
                              sync_binlog = 0
                              
                              slow_query_log = 1
                              slow_query_log_file = /var/log/mariadb/mysql.slow.log
                              Here is db operations one day:



                              Dashboard data:

                              Code:
                              Status of Zabbix
                              
                              Parameter	Value	Details
                              Zabbix server is running	Yes	localhost:10051
                              Number of hosts (enabled/disabled/templates)	3271	3218 / 0 / 53
                              Number of items (enabled/disabled/not supported)	566382	563542 / 0 / 2840
                              Number of triggers (enabled/disabled [problem/ok])	13315	10191 / 3124 [77 / 10114]
                              Number of users (online)	18	2
                              Required server performance, new values per second	4654.61	
                              Updated: 20:57:15

                              Comment

                              Working...