Ad Widget

Collapse

Opening Latest data is very slow

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mdresden
    Junior Member
    • Jan 2014
    • 16

    #1

    Opening Latest data is very slow

    Here is my issue:
    Opening the latest data tab under monitoring takes at least 30 seconds to open or longer. It depends on where I last left it but if it was set to all groups and all host, this is where it is the slowest. Other tabs under monitoring take along time also. So far I am only one user on it but later in future will have somewhere under a 100 people using it. When that time come a cluster of web frontends behind an F5 will go up, but I believe It should be faster as I am testing it now
    Here is my environment:
    • Zabbix Server / Zabbix Web Server

      8 cpu cores
      16 GB Ram
      Storage EMC SAN 10GBe connection to 4 disk RAID 10 with 15k drives
      Running the Zabbix Server daemon here
      Running the zabbix apache web front end here
    • Zabbix Mysql Server

      6 cpu
      6GB ram
      mysql 5.6
      innodb
      5 GB innodb buffer space
      2TB on EMC on 14 disk RAID 10
    • Current monitored environment

      Around 500 host all with zabbix agent


    Steps already taken to work on this issue:
    • Maxed out all but one available memory settings on the zabbix server
    • Gave 2GB to the php memory setting for apache vhost
    • Setup slow query logging - not getting anything over 1 second
    • Had DBA review mysql performance and not seeing any bottle necks there so far or slow queries
    • Disabled housekeeper
    • All Zabbix server monitored items look to be nearly idle
    • Zabbix server not running in swap


    Here is my zabbix configurations:
    Code:
    ListenPort=10051
    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=0
    DebugLevel=3
    PidFile=/var/run/zabbix/zabbix_server.pid
    DBHost=zabbix-db01.blah.web
    DBName=zabbix
    DBUser=zabbix
    DBPassword=blah
    DBPort=3306
    StartPollers=200
    StartIPMIPollers=50
    StartPollersUnreachable=100
    StartTrappers=200
    StartPingers=200
    StartDiscoverers=125
    StartHTTPPollers=100
    StartTimers=100
    SNMPTrapperFile=/var/log/snmptt/snmptt.log
    StartSNMPTrapper=1
    CacheSize=2G
    StartDBSyncers=40
    HistoryCacheSize=2G
    TrendCacheSize=2G
    HistoryTextCacheSize=2G
    ValueCacheSize=4G
    AlertScriptsPath=/usr/lib/zabbix/alertscripts
    ExternalScripts=/usr/lib/zabbix/externalscripts
    FpingLocation=/usr/sbin/fping
    LogSlowQueries=1000
    TmpDir=/tmp
    Here is the relevant excerpt from the apache conf
    Code:
    <Directory "/usr/share/zabbix">
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
        php_value max_execution_time 300
        php_value memory_limit 2048M
        php_value post_max_size 16M
        php_value upload_max_filesize 2M
        php_value max_input_time 300
        php_value date.timezone America/Detroit
    </Directory>
    At one time with a less robust system was monitoring 3000 servers with 1 second return on Latest Data

    So, While I have nearly unlimited resources to throw at this and do plan to segment this out to a much beefier environment to monitor more servers and many more consumptive checks beyond the default linux templates, I don't feel throwing more resources at it is going to help. Maybe I need a memory tweak somewhere, maybe indexing something will help, or setting up a ram disk somewhere, but I just can find it and its driving me crazy :/

    I can provide more detailed Database performance info tomorrow if needed

    Any advise here will be appreciated, It seems some else here must have ran into this same issue as I still have a fairly stock setup

    Also, this was with only 1 day of data
    Last edited by mdresden; 17-03-2014, 16:05.
  • pc99096
    Senior Member
    • Oct 2011
    • 193

    #2
    we are facing the same issue (2.2.1 version), it is probably a bug:


    i am just wondering, how to i get those statistics:

    ******************** Script profiler ********************
    Total time: 38.640018
    Total SQL time: 26.696099
    SQL count: 782 (selects: 392 | executes: 390)
    Peak memory usage: 111.5M
    Memory limit: 2048M

    does anyone know?
    i can see there is something in CProfiler.php, but where do i display the numbers?
    do i have to enable debug in zabbix_server.conf? my php frontend is on a different server than the zabbix binaries.
    Last edited by pc99096; 17-03-2014, 13:34.

    Comment

    • dakol
      Member
      • Jan 2008
      • 50

      #3
      when debug is actived on a usergroup, a "debug" link is displayed on top right.
      Help|Get support|Print|Profile|Debug|Logout

      Comment

      • pc99096
        Senior Member
        • Oct 2011
        • 193

        #4
        so there is no way to have it without changing the zabbix_server.conf file and restarting zabbix_server

        Comment

        • mdresden
          Junior Member
          • Jan 2014
          • 16

          #5
          Confirmed this is a known bug

          Per zabbix support:

          Sorry for the delay, this is definitely a bug (ZBX-7373). Unfortunately there is no estimated fix date for this just yet, but I will keep you posted should I hear any news from our developers.

          I have asked what the latest stable release was prior to this.

          Comment

          • thooge
            Junior Member
            • Mar 2014
            • 10

            #6
            Also here with Version 2.2.2

            Hello

            i have the same problem. Latest data is almost unusable.

            Another thing is, that the frontend in default tries to open the page with "group=all" and "host=all".
            I think it would be better to open the page with the first entries of the lists or to add the default entries "none" and "none" to display an empty page.

            Thomas

            Comment

            • pc99096
              Senior Member
              • Oct 2011
              • 193

              #7
              this could be done from adminsitraton -> general -> gui -> "dropdown first entry" to none

              Comment

              • thooge
                Junior Member
                • Mar 2014
                • 10

                #8
                Oh, thank you very much :-)

                But now i cannot select all hosts inside a group.
                Is there another switch i haven't found?

                Comment

                • pc99096
                  Senior Member
                  • Oct 2011
                  • 193

                  #9
                  that's the catch, i don't think it's possible

                  Comment

                  • mdresden
                    Junior Member
                    • Jan 2014
                    • 16

                    #10
                    Until this bug is fixed, I think this work around in the gui is the best option I have found. Its definitely better than waiting 60 seconds of load time and suffering through the constant auto refreshes.

                    If you need to review multiple members of a group at the same time, you might consider setting up screens. For example all tomcat host for an application with all the data you would want to see at a glance could be configured there.

                    Any other ideas on improving this in the meantime will be welcome.

                    Comment

                    • mdresden
                      Junior Member
                      • Jan 2014
                      • 16

                      #11
                      I will have to change my tune and agree with thooge. At first glance for latest data this seem like a nice workaround, but then you the ability to see all alerts, trigger, or all of anything where this is needed.

                      I will just live with this issue and hope the zabbix devs give this some priority.

                      Comment

                      • artsangel
                        Junior Member
                        • Jun 2012
                        • 5

                        #12
                        I think this is potentially caused by the removal of "lastvalue" from the items table.
                        Now the last value for all items needs to come from the much larger history tables, which can be tens of gigabytes in size for very large installations (like what we have at our organisation). It was much faster when lastvalue could come directly from items table.
                        I think the only real fix for this is going to be for the Zabbix devs to put those fields back.

                        We actually have a number of SQL queries here running directly against the database in order to generate complex reports, and removal of lastvalue and the necessity of writing much more complex nested and joined queries to use the history tables has caused these queries to run hundreds of times slower.

                        Comment

                        • steveboyson
                          Senior Member
                          • Jul 2013
                          • 582

                          #13
                          Originally posted by artsangel
                          I think this is potentially caused by the removal of "lastvalue" from the items table.
                          Now the last value for all items needs to come from the much larger history tables, which can be tens of gigabytes in size for very large installations (like what we have at our organisation). It was much faster when lastvalue could come directly from items table.
                          I think the only real fix for this is going to be for the Zabbix devs to put those fields back.

                          We actually have a number of SQL queries here running directly against the database in order to generate complex reports, and removal of lastvalue and the necessity of writing much more complex nested and joined queries to use the history tables has caused these queries to run hundreds of times slower.
                          Yes! This is exactly what we found out. The missing "lastvalue" is in my opinion a step in the wrong direction as it makes things more complicated - and less "speedy".
                          We developed a bunch of scripts and had to rewrite most of them. Biggest pain is if you want to filter the results by "lastvalue" - this causes a big number of SELECT statements to the history_* tables. Very slow and quite uncomfortable.

                          Comment

                          • dakol
                            Member
                            • Jan 2008
                            • 50

                            #14
                            Originally posted by steveboyson
                            Yes! This is exactly what we found out. The missing "lastvalue" is in my opinion a step in the wrong direction as it makes things more complicated - and less "speedy".
                            We developed a bunch of scripts and had to rewrite most of them. Biggest pain is if you want to filter the results by "lastvalue" - this causes a big number of SELECT statements to the history_* tables. Very slow and quite uncomfortable.
                            Removal of lastclock/lastvalue is a huge benefit for the database, ton's of UPDATE/SELECT are saved, so less contention/locks.

                            I have a lot of internal tools which use lastclock/lastvalue.

                            My workaround is to get all item of all hosts and store them in another SQL Database/file.
                            1) my tools are way faster and do not interfer with Zabbix poller/dbsyncer
                            2) dumping a host with +/- 170 takes 1,5s (with history2.patch applied)

                            Comment

                            • steveboyson
                              Senior Member
                              • Jul 2013
                              • 582

                              #15
                              Might be an useful approach but you have then time gaps between the zabbix and your external database.

                              I do it that way (pseudo code):
                              - I have a translation map which gives the tablename to a tableindex
                              - select itemid,value_type from items where key_ (or name) like <string>
                              - find correct history table for value_type from translation map
                              - select value,clock from <history_table> where itemid=<itemid> order by value desc limit 1

                              This gives me the "last value" for the required item, but requires two SQL statements. I can live with that but it complicates things as written before when I want to filter by "last value".
                              Furthermore, IIRC, "value" has no index in the history_* tables so searching requires a full table scan in that case.

                              Comment

                              Working...