Ad Widget

Collapse

'Latest data' broken (too slow) since upgrade to 5.0.2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • frcre
    Junior Member
    • Apr 2020
    • 10

    #1

    'Latest data' broken (too slow) since upgrade to 5.0.2

    Hello,

    Today I upgraded our Zabbix environment from 4.4.7 to 5.0.2, mainly for some UI improvements and the SNMP consolidation on host level. We have:

    - 1 frontend server (zabbix-frontend + nginx)
    - 1 Zabbix server (zabbix-server-pgsql)
    - 1 dedicated PostgreSQL server (11.something) - 2 CPU, 4GB RAM

    Some metrics:

    - 150 hosts monitored
    - 15k items monitored
    - 200 NVPS

    On 4.4 everything was running smoothly on the database end. However, since we upgraded to 5.0.2 the 'latest data' view is not working anymore. When we click 'monitoring > latest data' or 'monitoring > hosts > latest data for specific host', the page starts loading, disk reads on the database server go up, and the load on the database server grinds it down. With the 'latest data' page loading, we have a load of over 6.00 on the database server, where it is e.g. 0.03 at this moment with just the server writing new metrics to it.

    The only things that work:

    - https://zabbix/zabbix.php?action=lat...w&filter_rst=1 - this loads within 2 seconds
    - Going to the above link and setting very strict search filters (hostgroup, 1 host, specific item names) - this loads data in a few to 10 seconds

    Two remarks on all of this:

    1. We have partitioning setup on the database using pg_partman, but as far as I can see nothing changed there
    2. Under 'administratrion > general > gui' we had 'Limit for search and filter results' set to 1000 - setting this to 100 makes the general 'latest data' view a lot faster, but the 'latest data' link on the 'monitoring > hosts' page still doesn't load

    On the database end slow query logging is enabled, and we do see very slow logs like this:

    Code:
    2020-07-14 16:50:01.465 CEST [11116] zabbix@zabbix LOG:  duration: 256987.192 ms  statement: SELECT itemid FROM history_uint WHERE itemid IN (114995,114997,114998,115001,115008,115009,115012,115014,115016,115018,115019,115020,115024,115182,115184,115185,115186,115187,115188,115189,115190,115191,115192,115193,115194,115195,115196,115197,115198,115199,115200,115201,115202,115608,115609,115610,115611,115612,115613,115614,115615,115616,115617,115618,115619,115620,115621,115622,115623,115624,115625,115626,115627,115628,115629,115630,115631,115632,115633,115634,115635,115636,115637,115638,115639,115640,115641,115642,115643,115644,115645,115646,115647,115648,115649,115650,115651,115652,115653,115654,115655,115656,115657,115658,115659,115660,115661,115662,115663,115664,115665,115666,115667,115668,115670,115671) GROUP BY itemid HAVING MAX(clock)>1594651478
    This logging was not enabled previously, so I can't compare the query format with 4.4 queries. But in 4.4 'latest data' always worked fine, with the limit set to 1000 search results as well.

    Has anything changed with the latest data view and the queries it generates? Any other ideas about where to look?

    We also have Grafana interacting with the Zabbix API, this is all still working very smoothly.

    Edit: found something else (GUI limit set to 100):

    - Go to https://zabbix/zabbix.php?action=lat...w&filter_rst=1 - loads almost instantly
    - Go to some other page (configuration > host groups)
    - Go to monitoring > latest data - loads almost instantly

    - Within this view, filter for 1 host (no other filters), click apply - loads almost instantly
    - Go to some other page (configuration > host groups)
    - Go back to monitoring > latest data - loads almost instantly with the last known filters applied

    - Go to monitoring > hosts
    - Click 'latest data' for a host - starts loading, database load goes up, page never loads (backend timeout)
    - Now go to monitoring > latest data - same problem, doesn't load (+ database load)

    - Go again to the filter_rst link, works fine again as long as you don't go through monitoring > hosts or increase the system limit for search results

    Doing some side-by-side comparison of the links for the same host:

    - monitoring > latest data:
    Code:
    https://zabbix/zabbix.php?action=latest.view&filter_hostids%5B%5D=10597&filter_application=&filter_select=&filter_show_without_data=1&filter_set=1
    - monitoring > hosts > host:
    Code:
    https://zabbix/zabbix.php?action=latest.view&filter_set=1&filter_hostids%5B0%5D=10597
    The filter_hostids part is very strange: the link from the hosts part adds another '0'. When I go to 'monitoring > latest data' after trying to through 'monitoring > hosts > host', the '0' is there as well. Probably the 'latest data' filter that got set and remembered by clicking the host-specific link, but this looks like a bug?
    Last edited by frcre; 14-07-2020, 17:15.
  • frcre
    Junior Member
    • Apr 2020
    • 10

    #2
    Did some further testing, seems that if "&filter_show_without_data=1" is not present in the latest.view URL, the query is very slow. If I add this parameter to the URL generated on the monitoring > hosts page, this works fine.

    I attempted an nginx rewrite rule, but failed so far

    Submitted ZBX-18071 as it looks like a bug.

    Comment

    • dimir
      Zabbix developer
      • Apr 2011
      • 1080

      #3
      Let's add full link: https://support.zabbix.com/browse/ZBX-18071

      Comment

      • csmall
        Member
        • Jun 2020
        • 70

        #4
        Originally posted by dimir
        I can confirm. I just upgraded from 5.0.1 to 5.0.2 and it is broken for me as well.

        UPDATE: It seems intermittent. Works sometimes and others it doesn't.
        Last edited by csmall; 15-07-2020, 16:23.

        Comment

        • Jimlad
          Junior Member
          • Jun 2017
          • 2

          #5
          I can confirm the same for me 5.0.2 upgraded from 4.4 and now I can no long load latest data.

          Comment

          • Paxyon
            Junior Member
            • Jul 2020
            • 4

            #6
            Same for me. Here is my answer

            Comment

            • Sebastian
              Member
              • Jul 2020
              • 33

              #7
              I have found out that when I access host latest data for the first time it takes up to two hour to load page,
              It produces then many locks on database and many slow queries.
              Finally it opens page and works fine for that host.

              Next time when I access the same host it works much faster. 1-10 seconds,

              Comment

              • gullevek
                Junior Member
                • Nov 2008
                • 22

                #8
                I have the same problem here too. The first 5.0.x release worked fine, but 5.0.2 did some change that broke the data reading. The query is also strange as it uses "GROUP BY ... HAVING MAX(..)>" and therefore does not use any of the partition.

                The rst_filter flag works, and if I just select data from there (host) it loads fast as before.

                Comment

                Working...