Hello,
Today I upgraded our Zabbix environment from 4.4.7 to 5.0.2, mainly for some UI improvements and the SNMP consolidation on host level. We have:
- 1 frontend server (zabbix-frontend + nginx)
- 1 Zabbix server (zabbix-server-pgsql)
- 1 dedicated PostgreSQL server (11.something) - 2 CPU, 4GB RAM
Some metrics:
- 150 hosts monitored
- 15k items monitored
- 200 NVPS
On 4.4 everything was running smoothly on the database end. However, since we upgraded to 5.0.2 the 'latest data' view is not working anymore. When we click 'monitoring > latest data' or 'monitoring > hosts > latest data for specific host', the page starts loading, disk reads on the database server go up, and the load on the database server grinds it down. With the 'latest data' page loading, we have a load of over 6.00 on the database server, where it is e.g. 0.03 at this moment with just the server writing new metrics to it.
The only things that work:
- https://zabbix/zabbix.php?action=lat...w&filter_rst=1 - this loads within 2 seconds
- Going to the above link and setting very strict search filters (hostgroup, 1 host, specific item names) - this loads data in a few to 10 seconds
Two remarks on all of this:
1. We have partitioning setup on the database using pg_partman, but as far as I can see nothing changed there
2. Under 'administratrion > general > gui' we had 'Limit for search and filter results' set to 1000 - setting this to 100 makes the general 'latest data' view a lot faster, but the 'latest data' link on the 'monitoring > hosts' page still doesn't load
On the database end slow query logging is enabled, and we do see very slow logs like this:
This logging was not enabled previously, so I can't compare the query format with 4.4 queries. But in 4.4 'latest data' always worked fine, with the limit set to 1000 search results as well.
Has anything changed with the latest data view and the queries it generates? Any other ideas about where to look?
We also have Grafana interacting with the Zabbix API, this is all still working very smoothly.
Edit: found something else (GUI limit set to 100):
- Go to https://zabbix/zabbix.php?action=lat...w&filter_rst=1 - loads almost instantly
- Go to some other page (configuration > host groups)
- Go to monitoring > latest data - loads almost instantly
- Within this view, filter for 1 host (no other filters), click apply - loads almost instantly
- Go to some other page (configuration > host groups)
- Go back to monitoring > latest data - loads almost instantly with the last known filters applied
- Go to monitoring > hosts
- Click 'latest data' for a host - starts loading, database load goes up, page never loads (backend timeout)
- Now go to monitoring > latest data - same problem, doesn't load (+ database load)
- Go again to the filter_rst link, works fine again as long as you don't go through monitoring > hosts or increase the system limit for search results
Doing some side-by-side comparison of the links for the same host:
- monitoring > latest data:
- monitoring > hosts > host:
The filter_hostids part is very strange: the link from the hosts part adds another '0'. When I go to 'monitoring > latest data' after trying to through 'monitoring > hosts > host', the '0' is there as well. Probably the 'latest data' filter that got set and remembered by clicking the host-specific link, but this looks like a bug?
Today I upgraded our Zabbix environment from 4.4.7 to 5.0.2, mainly for some UI improvements and the SNMP consolidation on host level. We have:
- 1 frontend server (zabbix-frontend + nginx)
- 1 Zabbix server (zabbix-server-pgsql)
- 1 dedicated PostgreSQL server (11.something) - 2 CPU, 4GB RAM
Some metrics:
- 150 hosts monitored
- 15k items monitored
- 200 NVPS
On 4.4 everything was running smoothly on the database end. However, since we upgraded to 5.0.2 the 'latest data' view is not working anymore. When we click 'monitoring > latest data' or 'monitoring > hosts > latest data for specific host', the page starts loading, disk reads on the database server go up, and the load on the database server grinds it down. With the 'latest data' page loading, we have a load of over 6.00 on the database server, where it is e.g. 0.03 at this moment with just the server writing new metrics to it.
The only things that work:
- https://zabbix/zabbix.php?action=lat...w&filter_rst=1 - this loads within 2 seconds
- Going to the above link and setting very strict search filters (hostgroup, 1 host, specific item names) - this loads data in a few to 10 seconds
Two remarks on all of this:
1. We have partitioning setup on the database using pg_partman, but as far as I can see nothing changed there
2. Under 'administratrion > general > gui' we had 'Limit for search and filter results' set to 1000 - setting this to 100 makes the general 'latest data' view a lot faster, but the 'latest data' link on the 'monitoring > hosts' page still doesn't load
On the database end slow query logging is enabled, and we do see very slow logs like this:
Code:
2020-07-14 16:50:01.465 CEST [11116] zabbix@zabbix LOG: duration: 256987.192 ms statement: SELECT itemid FROM history_uint WHERE itemid IN (114995,114997,114998,115001,115008,115009,115012,115014,115016,115018,115019,115020,115024,115182,115184,115185,115186,115187,115188,115189,115190,115191,115192,115193,115194,115195,115196,115197,115198,115199,115200,115201,115202,115608,115609,115610,115611,115612,115613,115614,115615,115616,115617,115618,115619,115620,115621,115622,115623,115624,115625,115626,115627,115628,115629,115630,115631,115632,115633,115634,115635,115636,115637,115638,115639,115640,115641,115642,115643,115644,115645,115646,115647,115648,115649,115650,115651,115652,115653,115654,115655,115656,115657,115658,115659,115660,115661,115662,115663,115664,115665,115666,115667,115668,115670,115671) GROUP BY itemid HAVING MAX(clock)>1594651478
Has anything changed with the latest data view and the queries it generates? Any other ideas about where to look?
We also have Grafana interacting with the Zabbix API, this is all still working very smoothly.
Edit: found something else (GUI limit set to 100):
- Go to https://zabbix/zabbix.php?action=lat...w&filter_rst=1 - loads almost instantly
- Go to some other page (configuration > host groups)
- Go to monitoring > latest data - loads almost instantly
- Within this view, filter for 1 host (no other filters), click apply - loads almost instantly
- Go to some other page (configuration > host groups)
- Go back to monitoring > latest data - loads almost instantly with the last known filters applied
- Go to monitoring > hosts
- Click 'latest data' for a host - starts loading, database load goes up, page never loads (backend timeout)
- Now go to monitoring > latest data - same problem, doesn't load (+ database load)
- Go again to the filter_rst link, works fine again as long as you don't go through monitoring > hosts or increase the system limit for search results
Doing some side-by-side comparison of the links for the same host:
- monitoring > latest data:
Code:
https://zabbix/zabbix.php?action=latest.view&filter_hostids%5B%5D=10597&filter_application=&filter_select=&filter_show_without_data=1&filter_set=1
Code:
https://zabbix/zabbix.php?action=latest.view&filter_set=1&filter_hostids%5B0%5D=10597

Comment