Hey everybody,
I have been dealing with a maddening problem with our Zabbix installation. Every morning between 1AM and 4AM we have a lot of our triggers that are using the nodata() function firing.
I'm trying to rule out anything internal to Zabbix that is happening every morning between 1AM and 4AM or if it is something external to Zabbix causing these issues.
The only manifestation of any issues in the zabbix logs appears as more and slower query warning in the server and proxy log files. Normally during the day we will occasionally get "slow query" warnings with most of the query times between 3 and 5 seconds. However, during the 1AM - 4AM time range we see a lot more of these warnings in the server log and they are for much longer times. The proxy logs almost never have any slow query warnings except between 1AM - 4AM where we get several warnings with the timings between 3 and 5 seconds (running SQLite DB locally).
For example (here are two warnings in the server log. One from 9:00 AM this morning and one from 04:02 AM this morning)
We are trying to look for anything in our environment that might be causing these slow downs, but also wanted to see if there was anything scheduled internally to Zabbix between 1AM and 4AM.
Interestingly enough, these triggers usually suffer the "resolved time before the alert time" symptom, which I guess makes sense if things have slowed down and the server is not able to process the item data from all of our proxy servers fast enough (but everything was actually up and running).
We are currently running Zabbix 5.0.1 on Ubuntu 18.04 using PostGres 10.12 (running on VMWare : VM has 8CPU/12G RAM) . I have turned off housekeeping for items/trends as I have a custom script that prunes these records every hour (no partitioning currently being used). Our configuration runs perfectly fine except between 1AM and 4AM when everything seems to slow down dramatically (but not for the entire hour, just for about 10 minutes each hour when it calms back down until the next hour).
So...
1:00 AM - 1:10 AM -> SLOW
1:11 AM - 1:59 AM -> Normal
2:00 AM - 2:10 AM -> SLOW
2:11 AM - 2:59 AM - Normal
Does anybody know if there is something scheduled in Zabbix internally on the hour ONLY between 1AM and 4AM (or something that behaves very differently between those hours) that might explain these symptoms? We are still looking at our environment to see if there is something external causing these issues.
I have been dealing with a maddening problem with our Zabbix installation. Every morning between 1AM and 4AM we have a lot of our triggers that are using the nodata() function firing.
I'm trying to rule out anything internal to Zabbix that is happening every morning between 1AM and 4AM or if it is something external to Zabbix causing these issues.
The only manifestation of any issues in the zabbix logs appears as more and slower query warning in the server and proxy log files. Normally during the day we will occasionally get "slow query" warnings with most of the query times between 3 and 5 seconds. However, during the 1AM - 4AM time range we see a lot more of these warnings in the server log and they are for much longer times. The proxy logs almost never have any slow query warnings except between 1AM - 4AM where we get several warnings with the timings between 3 and 5 seconds (running SQLite DB locally).
For example (here are two warnings in the server log. One from 9:00 AM this morning and one from 04:02 AM this morning)
Code:
16215:20200716:090015.193 slow query: 3.478317 sec, "insert into trends_uint (itemid,clock…..
Code:
16217:20200716:040213.392 slow query: 72.393781 sec, "insert into trends_uint (itemid,clock...
Interestingly enough, these triggers usually suffer the "resolved time before the alert time" symptom, which I guess makes sense if things have slowed down and the server is not able to process the item data from all of our proxy servers fast enough (but everything was actually up and running).
We are currently running Zabbix 5.0.1 on Ubuntu 18.04 using PostGres 10.12 (running on VMWare : VM has 8CPU/12G RAM) . I have turned off housekeeping for items/trends as I have a custom script that prunes these records every hour (no partitioning currently being used). Our configuration runs perfectly fine except between 1AM and 4AM when everything seems to slow down dramatically (but not for the entire hour, just for about 10 minutes each hour when it calms back down until the next hour).
So...
1:00 AM - 1:10 AM -> SLOW
1:11 AM - 1:59 AM -> Normal
2:00 AM - 2:10 AM -> SLOW
2:11 AM - 2:59 AM - Normal
Does anybody know if there is something scheduled in Zabbix internally on the hour ONLY between 1AM and 4AM (or something that behaves very differently between those hours) that might explain these symptoms? We are still looking at our environment to see if there is something external causing these issues.
Comment